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Preface 



The Cryptographers’ Track (CT-RSA) is a research conference within the RSA 
conference, the largest, regularly staged computer security event. CT-RSA 2004 
was the fourth year of the Cryptographers’ Track, and it is now an established 
venue for presenting practical research results related to cryptography and data 
security. 

The conference received 77 submissions, and the program committee selec- 
ted 28 of these for presentation. The program committee worked very hard to 
evaluate the papers with respect to quality, originality, and relevance to crypto- 
graphy. Each paper was reviewed by at least three program committee members. 
Extended abstracts of the revised versions of these papers are in these procee- 
dings. The program also included two invited lectures by Dan Boneh and Silvio 
Micali. 

I am extremely grateful to the program committee members for their enor- 
mous investment of time and effort in the difficult and delicate process of review 
and selection. Many of them attended the program committee meeting during 
the Crypto 2003 conference at the University of California, Santa Barbara. 

I gratefully acknowledge the help of a large number of colleagues who re- 
viewed submissions in their area of expertise: Masayuki Abe, Torn Akishita, 
Kazumaro Aoki, Gildas Avoine, Joonsang Baek, Harald Baier, Alex Biryukov, 
Dario Catalano, Xiaofeng Chen, Benoit Chevallier-Mames, J.S. Coron, Christo- 
phe De Canniere, Alex Dent, J.-F. Dhem, Matthias Fitzi, Marc Fossorier, Steven 
Galbraith, Pierrick Gaudry, Craig Gentry, Shai Halevi, Helena Handschuh, Ja- 
vier Herranz Sotoca, Doi Hiroshi, Thomas Holenstein, Tetsu Iwata, Tetsuya Izu, 
Miodrag J. Mihaljevic, Jacques J.A. Fournier, Markus Jakobsson, Dominic Jost, 
Pascal Junod, Naoki Kanayama, Hiroki Koga, Yuichi Komano, Hugo Krawczyk, 
Dennis Kuegler, Noboru Kunihiro, Eyal Kushilevitz, Yi Lu, Christoph Ludwig, 
Philip MacKenzie, Keith Martin, Kazuto Matsuo, Jean Monnerat, Shiho Mo- 
riai, Christophe Mourtel, Sean Murphy, David Naccache, Koh-Ichi Nagao, An- 
derson Nascimento, Wakaha Ogata, Kenji Ohkuma, Satomi Okazaki, Elisabeth 
Oswald, Daniel Page, Kenny Paterson, Krzysztof Pietrzak, Zulfikar Ramzan, 
Renato Renner, Taiichi Saito, Ryuichi Sakai, Kouichi Sakurai, Arthur Schmidt, 
Katja Schmidt-Samoa, Junji Shikata, Atsushi Shimbo, Johan Sjodin, Ron Stein- 
feld, Makoto Sugita, Masahiko Takenaka, Jin Tamura, Bogdan Warinschi, Kai 
Wirt, Xun Yi, and Rui Zhang. 

Electronic submissions were made possible by the Web Review system of K.U. 
Leuven. I would like to thank Bart Preneel for his kind support. Special thanks to 
Thomas Herlea, who greatly supported us by operating the Web Review system 
customized for CT-RSA 2004. 

In addition, I would like to thank Mami Yamaguchi for her support in the 
review process and in editing these proceedings. 
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I am specially grateful to Burt Kaliski and Ari duels of RSA Laboratories for 
interfacing with the RSA conference. 

I wish to thank all the authors, who by submitting papers made this confe- 
rence possible, and the authors of accepted papers for their cooperation. 



December 2003 



Tatsuaki Okamoto 
Program Chair 
CT-RSA 2004 
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Online Encryption Schemes: New Security 
Notions and Constructions 



Alexandra Boldyreva and Nut Taesombut 

Dept, of Computer Science & Engineering, 
University of California at San Diego, 9500 Gilman Drive, 
La Jolla, California 92093, USA. 

{aboldyre ,ntaesomb}@cs .ucsd.edu 
http: //www-cse .ucsd. edu/users/ {aboldyre .ntaesomb} 



Abstract. We investigate new strong security notions for on-line sym- 
metric encryption schemes, which are the schemes whose encryption and 
decryption algorithms operate “on-the-fly” and in one pass, namely can 
compute and return an output block given only the key, the current input 
block and the previous input and output blocks. We define the stron- 
gest achievable notion of privacy which takes into account both chosen- 
ciphertext attacks and the recently introduced blockwise-adaptive [15, 
12] attacks. We show that all the schemes shown to be secure against 
blockwise-adaptive chosen-plaintext attacks are subject to blockwise- 
adaptive chosen-ciphertext attacks. We present an on-line encryption 
scheme which is provably secure under our notion. It uses any strong on- 
line cipher, the primitive introduced in [1]. We finally discuss the notion 
of authenticated on-line schemes and provide a secure construction. 



1 Introduction 

Online Encryption Schemes. In this work we investigate strong security 
notions for on-line symmetric encryption schemes and analyze various construc- 
tions. 

The on-line property of a function requires that each output block depend 
only on the key, the previous input and output blocks and the current input 
block. We say that an encryption scheme is on-line if for each key its encryption 
and decryption algorithms are both on-line^ . We provide a more formal definition 
of on-line encryption schemes in Section 3. The on-line property allows schemes 
to be used in applications where encryption and decryption should perform “on- 
the-fly” , in one pass, i.e. an output block should be returned as soon as the next 
input block is received. Example situations include the use of encryption modes 
in SSH protocol or when implemented in tamper-proof devices with low memory. 

^ Note that in the literature on-line encryption schemes require only the encryption 
algorithm to be on-line. 
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Most of the existing symmetric encryption schemes, e.g. CBC, CTR modes, are 
on-line 

Privacy of on-line schemes. In the model for the standard notion of privacy 
(indistinguishability) against chosen-plaintext attacks (IND-CPA) [2] an advers- 
ary runs in two stages. In both stages it is given a symmetric encryption oracle. 
During the first stage the adversary has to output two equal-length messages. 
At the last stage it gets a challenge ciphertext, which is an encryption of one of 
the two messages. It wins if it can distinguish which message was encrypted. 

As has been recently observed by Joux, Martinet and Valette in [15], the stan- 
dard notion for IND-CPA security treats encryption and decryption as “atomic” 
operations and, therefore, is not strong enough for on-line encryption schemes, 
whose algorithms can be executed in the on-line ( “non-atomic” ) manner^. For 
the same reason the existing proofs of security done in the standard model can- 
not guarantee security of an on-line scheme with on-line execution. To model 
the privacy notion of schemes with on-line encryption, [15] takes into account 
so-called “blockwise-adaptive attacks” , when an adversary is allowed to generate 
two equal-length messages block- by-block, each time seeing the corresponding 
portion of the challenge ciphertext and being able to “adapt” the next blocks of 
the messages as a function of currently received ciphertext blocks. 

The authors noted that blockwise-adaptive attacks do not seem to apply for 
fully parallelizable “not chained” encryption schemes such as CTR mode but 
showed that some popular chained encryption schemes, including CBC, are in 
fact insecure against such attacks. The notion of security of on-line encryption 
schemes against blockwise-adaptive chosen-plaintext attacks (IND-BLK-CPA) 
has been defined more formally by Fouque, Martinet and Poupard in [12]. They 
prove that CFB encryption mode [18] and DCBC, the modification to CBC 
scheme, are IND-BLK-CPA. 

But nowadays preserving privacy only against chosen-plaintext attacks is 
not considered sufficient; it is desirable to achieve stronger notions of privacy, 
such as security against chosen-ciphertext attacks. The standard notion of pri- 
vacy against chosen-ciphertext attacks (IND-CCA) is similar to the one against 
chosen-plaintext attacks (IND-CPA), except that the adversary is also given a 
decryption oracle with a restriction of not querying it on a challenge cipher- 
text. IND-CCA notion seems insufficient for on-line schemes since it does not 
allow blockwise-adaptive attacks. Security of on-line schemes against blockwise- 
adaptive chosen-ciphertext attacks has not been previously investigated. This is 
the focus of our work. 

^ These modes can also be executed in non on-line manner, outputting the whole 
plaintext (resp. ciphertext) only after the encryption (resp. decryption) algorithm 
completes. 

® If an on-line scheme is implemented only for “atomic” (not on-line execution), then 
the standard notions of security are appropriate. In this paper when we refer to an 
on-line scheme we always assume the possibility of “non-atomic” execution of both 
encryption and decryption algorithms of the scheme (when the output blocks are 
returned on-the-fly.) 
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Related work. Very recently in their independent work Fouque et al. [11] in- 
vestigated the notions of secure authenticated encryption for applications using 
low memory highly-secure devices such as smart cards together with insecure 
but powerful host devices. The applications have been topics of previous rese- 
arch known as “remotely-keyed encryption” (RKE) and “remotely-keyed aut- 
henticated encryption” (RKAE). The works in these areas include [6,17)7,14, 
9]. The neat scheme proposed in [11] allows a smart card to partially decrypt 
a ciphertext produced by any secure authenticated encryption scheme (most of 
which have on-line encryption algorithms) in on-line manner while preserving 
privacy and authenticity. However, the host needs to perform a second pass over 
the data to complete the decryption (this phase is keyless). Hence, their scheme 
(as well as the schemes of [6,17,7,14,9]) does not fall under our definition of an 
on-line scheme, since their decryption algorithm is not done in one-pass. 

The authors of [11] attempt to define the notion of privacy against blockwise- 
adaptive chosen-ciphertext attacks. While their notion is appropriate for schemes 
with on-line encryption, it is unachievable for schemes with on-line decryption, 
as we show below. Therefore, a new security notion is needed for analysis of 
on-line encryption schemes beyond chosen-plaintext attacks. 

Towards a stronger notion of privacy for on-line schemes. As we 
mentioned above we seek for an appropriate notion of privacy under blockwise- 
adaptive chosen-ciphertext attacks for on-line encryption schemes and secure 
constructions. It turns out that finding the answers for these interesting theore- 
tical and practically important questions requires some extra care. 

For example, the approach taken by Fouque et al. [11] was to strengthen the 
IND-BLK-CPA notion defined by [15,12] by giving the adversary the particular 
decryption oracle which operates block-by-block, namely, takes ciphertext blocks 
and returns the corresponding plaintext blocks “on-the-fly” . Indeed, this is the 
first definition that comes to mind. However, we point out that giving such a 
decryption oracle is equivalent to giving the adversary the standard decryption 
oracle. This can be realized by noting that unlike encryption, the decryption algo- 
rithm is always deterministic and any “blockwise-adaptive” queries C[l], C[2], . . . 
to the particular decryption oracle can be replaced by queries C [1] , C [1] 1 1 C [2] , . . . 
to the standard decryption oracle. 

Moreover, we claim that extending the IND-BLK-CPA notion with chosen- 
ciphertext attacks by giving the adversary the decryption oracle (which is equi- 
valent to the notion proposed in [11]) is too strong for analyzing on-line encryp- 
tion schemes. We show that by noting that no on-line encryption scheme can 
be secure even under the standard IND-CCA notion. We justify this claim in 
Section 3 by presenting a simple adversary which does not use any blockwise- 
adaptive attacks. Therefore, the standard IND-CCA notion and, moreover, its 
straightforward extension to blockwise-adaptive attacks are unachievable for on- 
line schemes. 

We seek a strongest achievable privacy notion for on-line encryption sche- 
mes. In Section 3 we provide such a notion. The model takes into account both 
blockwise-adaptive chosen-plaintext attacks using the ideas from [15,12] and a 
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class of chosen-ciphertext attacks by giving the adversary the special decryp- 
tion oracle. The queries to this special decryption oracle are answered in such 
a way that the notion is achievable. Please refer to Definition 1 for the details 
of this security notion, which we call IND-BLK-CCA. One may argue that our 
notion does not exactly capture the whole class of chosen-ciphertext attacks. 
This is true, but one should realize that any scheme with on-line encryption 
and decryption execution is always subject to some chosen-ciphertext attacks as 
we claimed above and therefore the designers and implementors of on-line en- 
cryption schemes should not target for IND-CCA security. This does not mean, 
however, that IND-BLK-CPA security will be sufficient. Our notion captures the 
best possible security for schemes with on-line execution. 

Attacks. In Section 4 we show that the schemes proved to resist blockwise- 
adaptive chosen-plaintext attacks (shown in [12] to be IND-BLK-CPA secure) 
are subject to blockwise-adaptive chosen-ciphertext attacks. Namely, we show 
that CFB and DCBC encryption modes are not IND-BLK-CCA secure. These 
attacks are very simple and are similar to the well-known IND-CCA attacks on 
these schemes. The result is not surprising since these schemes were not designed 
to resist such powerful attacks. 

IND-BLK-CCA secure construction. If one does not have a goal of ha- 
ving an on-line symmetric encryption scheme, it is easy to achieve the IND- 
CCA security. Having an IND-CPA secure symmetric encryption scheme SS, 
e.g. CBC, and a message authentication code F strongly unforgeable against 
chosen-message attack (SUF-CMA secure), e.g. XCBC [5] or OMAC [13]"*, one 
can obtain an IND-CCA secure scheme via “encrypt-then-MAC” paradigm [4]. 
Namely, in order to encrypt a message, first use the encryption algorithm of S£ 
to get a ciphertext C and then apply the tag generation algorithm of F to C to 
get the tag t. The resulting ciphertext is Cjlr. In order to decrypt a ciphertext 
C", parse it as C”\\t'' , verify the tag r", using the verification algorithm of F, 
and if it valid, decrypt C" using the decryption algorithm of S£ and return the 
result, otherwise, return a special symbol T. However, this “encrypt-then-MAC” 
paradigm does not allow to preserve the on-line property of the given encryption 
scheme. More precisely, the on-line property of decryption becomes violated: the 
resulting decryption algorithm needs the whole ciphertext to verify the tag. The 
same problem holds for specific authenticated encryption schemes such as, for 
example, OCB authenticated scheme [19], lACBC, lAPM [16] and XCBC [10], 
which are not constructed using the generic “encrypt-then-MAC” paradigm. 
They provide both privacy (IND-CPA security) and authenticity (INT-CTXT 
security) and, hence IND-CCA security [4] However, their decryption algo- 
rithm is not on-line, before being able to output the message, it first needs to 
verify the checksum, which depends on all the blocks of the message. 

These MACs are shown to be weakly unforgeable (WUF-CMA secure), however, for 
stateless deterministic MACs these two notions are equivalent. 

® It is shown in [4] that if an encryption scheme is secure against chosen-plaintext 
attacks (IND-CPA secure) and provides integrity of a ciphertext (INT-CTXT secure) 
then it is also secure against chosen-ciphertext attacks (IND-CCA secure). 
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We look for an on-line encryption scheme provably secure against blockwise- 
adaptive chosen-ciphertext attacks. As a building block we use on-line ciphers, 
the notion introduced and analyzed by Bellare, Boldyreva, Namprempre and 
Knudsen in [1]. On-line ciphers are on-line, length-preserving permutations. Se- 
cure on-line ciphers are the ones which closely approximate truly random on- 
line length-preserving permutations. The authors [1] formally define the secu- 
rity notion for on-line ciphers against chosen-ciphertext attacks and present the 
construction called HPCBC, which is secure against chosen-ciphertext attacks 
assuming the underlying block cipher and the hash function are secure (the con- 
struction uses one block cipher and one keyed hash function invocation per input 
block) . On-line ciphers are deterministic permutations and therefore are not even 
IND-CPA secure. The authors discuss how on-line ciphers can be used to obtain 
IND-CPA and INT-CTXT secure schemes. As we discussed above and as inde- 
pendently noted in [11], this does not immediately guarantee IND-BLK-CCA 
security. 

In Section 5 we present our second main result. We give an on-line encryption 
scheme which is constructed using any on-line cipher. We prove it IND-BLK- 
CCA secure assuming the underlying cipher is a strong pseudorandom on-line 
permutation. Theorem 1 states the concrete security result. We next present a 
specific scheme based on HPCBC on-line cipher. 

Authenticated on-line encryption schemes. As we discussed above, all 
known schemes that provide authenticity (INT-CTXT security) are not on-line. 
Moreover, the standard notion of authenticity INT-CTXT is not appropriate 
to analyze on-line encryption schemes because it assumes that decryption is an 
atomic operation. In the full version of this paper [8] we define an appropriate 
notion of authenticity for on-line encryption schemes and provide an encryption 
scheme which simultaneously provides authenticity and IND-BLK-CCA security. 

2 Preliminaries 

Notation. A string is a member of {0, 1}*. Let |A| denote the length of a string 
X. For strings X,Y let X\\Y denote their concatenation. The empty string is 
denoted er. If d > 1, n > 1, are integers, then Dd^n denotes the set of all strings 
whose length is a positive multiple of n bits and at most dn bits (we borrow some 
notation from [1] for convenience). If A G then X\i] denotes its tth block, 

meaning X = A[l]|| . . . ||A[^] where I = |A|/n and |A[z]| = n for alH = 1, . . . , L 
We will mostly consider functions with inputs and outputs in Dd^n, hence both 
can be viewed as sequences of n-bit-long blocks. 

Ciphers and on-line ciphers. Informally, a cipher is a keyed deterministic 
length-preserving permutation. We recall the formal definitions presented in [1]. 
Let F: Keys{F) x Dom{F) — >■ Rng{F) be a function family, where Keys{F) 
is the key space of F; Dom{F) is the domain of F; and Rng{F) is the range 
of F. F is a cipher if Dom{F) = Rng{F) and each instance F{K,-) of F is a 
length-preserving permutation. If F is a cipher, then F~^ is the inverse cipher, 
defined by F~^{K,x) = F{K,-)~^{x) for all K G Keys{F) and x G Dom{F). 
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A function /: Dd^n Dd,n is n-on-line if the first block of the output de- 
pends only on the first block of the input and the z-th block of the output (for 
z > 2) is determined completely by the (z — l)-th block of the input, (z — l)-th 
block of the output and the z-th block of the input®. A cipher is zz-on-line if each 
instance is an on-line function. It is shown in [1] that the inverse of an on-line 
cipher is itself n-on-line. For convenience we will often omit n in the “n-on-line” 
term. 



Security of on-line ciphers. Let OPerm^^n denote the family of all n-on-line, 
length-preserving permutations on Dd^n- A “secure” on-line cipher is one that 
closely approximates OPermd_„. Let F: Keys{F) x Dd^n — >■ be a family of 
functions with domain and range Dd^n- Let A be a distinguisher with access to 
two oracles. 

We define the advantage of A attacking an on-line, pseudorandom permuta- 
tion (OPRP) F chosen-ciphertext attacks as 



Adv°PJ"““ = Pr 



F : = 1 



Pr 



g A OPermd,„ 



A9,a ^ = 1 



This captures the advantage of the distinguisher in the task of distinguishing a 
random instance of F from a random, length-preserving, n-on-line permutation 
on Dd^n- The distinguisher can query the challenge instance and its inverse. 
For any fJ^e^qd, gd we define the advantage of F as OPRP against chosen- 
ciphertext attacks as 

q,, /,,) = max { Adv^PJ-^^ }, 

where the maximum is over all A, having the time complexity t, making at most 
Qe queries to the challenge instance oracle, whose total length is at most ^e, and 
making at most qd queries to the challenge inverse oracle, whose total length is 
at most idd- An on-line cipher F is secure against chosen-ciphertext attacks (is 
a strong cipher) if the function Adv'^P’^P ccaQ gj.Q.^g “slowly”. 

Symmetric encryption schemes and their security. A symmetric en- 
cryption scheme SE = (/C,£1,P) consists of three algorithms. The randomized 
key generation algorithm /C returns a key K; we write K 4^ 1C. Associated with 
each encryption scheme there is a message space MsgSp, from which messages 
are drawn. In our context it will be important to make explicit the random coins 
underlying the randomized encryption algorithm £. On input a key K, a plain- 
text M € MsgSp, and coin tosses R the randomized encryption algorithm £ 
returns the ciphertext C; we write C ^ £k{M; R). The notation C ^ £k{M) is 
a shorthand for r A Coinsf: ; C ^ £k{M] R), where Coinsf: is the set from which 
randomness is drawn. The deterministic decryption algorithm T> takes K and a 
ciphertext C and returns the message M or a symbol T; we write M ^ T>k{C) 
or T ^ T>k{C). We require that T>k{£k{M)) = M for all M G MsgSp. 

An encryption scheme is said to provide privacy against chosen-plaintext 
attacks (be IND-CPA secure) if no adversary with reasonable resources can win 



For convenience we use a slightly stronger definition, than the one proposed in [1]. 
All the results and constructions of [1] satisfy our definition as well. 
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with non-negligible probability the game defined in the following experiment. 
An adversary runs in two stages. In both stages it is given a symmetric encryp- 
tion oracle. During the first stage the adversary has to output two equal-length 
messages. One of the messages is chosen at random. At the last stage the ad- 
versary gets a challenge ciphertext, which is an encryption of this message. The 
adversary wins if it can distinguish which message was encrypted. 

Seurity against chosen-plaintext attacks (IND-CCA security) is defined si- 
milarly, except the adversary in both stages is given a decryption oracle with a 
restriction of not querying it during the last stage on the challenge ciphertext. 

3 Online Encryption Schemes and Their Secnrity 

Let d, n be integers. Let S£ = (/C, f, T>) be a symmetric encryption scheme with 
MsgSp = Dd^n- We will refer to n as the block-length of a message. We say that 
S£ is a scheme with n-on-line encryption if for every key K, coins R and message 
M, C[l] (where C = £k{M] R)) depends only on K and R; and C[i] (for i>2) 
depends only on M[i — 2],M[i — V\,C[i — l], K and possibly, R^ . Similarly, we say 
that S£ is a scheme with n-on-line decryption if for every key K and ciphertext 
C, M[i] (for t > 1) depends only on C[i ] , C[i and possibly R. In this work 
we are interested in symmetric encryption schemes with n-on-line encryption and 
decryption. We call such a subclass n-on-line symmetric encryption schemes. It 
is easy to see that most of the popular block-cipher based encryption schemes 
(which are also called encryption modes) are on-line encryption schemes, e.g. 
CBC and CTR modes. 

Privacy of on-line encryption schemes. We investigate stronger notions 
of security for on-line encryption schemes. As we mentioned in Section 1 we look 
for a notion of privacy which takes into account both blockwise-adaptive attacks 
and chosen-ciphertext attacks. 

But let us first claim that even the standard notion of privacy against chosen- 
ciphertext attacks is unachievable for on-line encryption schemes, meaning no 
on-line encryption scheme can be IND-CCA. We prove the claim by presenting a 
simple adversary attacking a given on-line encryption scheme. It does not make 
use of any blockwise adaptive attacks, but simply employs the on-line property of 
the decryption algorithm. The adversary is given the decryption oracle. First, the 
adversary generates any two two-block messages with distinct first blocks. Given 
the challenge ciphertext, which is an encryption of one of the two messages, it 
removes its last block and queries the result to its decryption oracle. The query 
is different from the challenge and thus is being legitimate. Since the first block 
of the corresponding plaintext does not depend on the removed last block of 
the ciphertext due to the on-line property of decryption, the adversary will get 
back the first block of one of the messages it generated at the previous stage and 
hence can always win its game. 

^ For an example of on-line encryption scheme, see Figure 1 in Section 5.2. The defini- 
tion can be easily modified to include nonces, counters, etc. to satisfy any intuitively 
on-line scheme, e.g. OCB. 




A. Boldyreva and N. Taesombut 



The same attack would obviously apply when we enhance the IND-CCA no- 
tion with blockwise-adaptive attacks. We want to define the strongest achievable 
notion of privacy for on-line schemes. Let us start with some intuition. 

As we discussed in the Introduction and above, in the applications which re- 
quire on-line encryption schemes the standard notion of privacy against chosen- 
ciphertext attacks cannot be achieved. This means that there exist a particular 
type of chosen-ciphertext attacks which an on-line encryption scheme cannot 
resist, namely the ones where the chosen ciphertext has the same prefix as the 
challenge ciphertext. This does not mean, however, that the practitioners should 
not expect an on-line encryption scheme to resist various other chosen-ciphertext 
attacks. For example, assume an adversary wants to decrypt a ciphertext of some 
important message. Suppose that it flips a bit in each block of the ciphertext, 
makes the receiver decrypt the result and manages to see the corresponding mes- 
sage. It is highly undesirable that the adversary be able to infer some information 
on the original important message. This kind of attack is not captured by either 
of IND-CPA nor IND-BLK-CPA notions and as we show in the next section, 
even the schemes known to be IND-BLK-CPA secure are subject to this type of 
attacks. But there exist on-line encryption schemes which resist such attacks as 
well as blockwise-adaptive attacks. Therefore, such schemes would be desirable 
in on-line applications with stronger security requirements. Accordingly we de- 
vise a notion of security which captures all blockwise-adaptive chosen-plaintext 
attacks and in addition captures these type of chosen-ciphertext attacks. 

The intuition behind the notion is as follows. As in IND-BLK-CPA notion 
the adversary gets an encryption oracle and outputs two equal-length messages 
block by block, each time receiving the new portion of the challenge ciphertext, 
which is an encryption of one of the messages. This captures blockwise-adaptive 
attacks since the adversary can adapt each next message-block pair depending on 
the previously received portions of the ciphertext. To capture the aforementioned 
class of chosen-ciphertext attacks we allow the adversary to obtain decryptions 
of ciphertexts of its choice unless the ciphertext has a common prefix with a 
challenge ciphertext. This captures the attack discussed above. If, however, there 
is a common prefix, say I first blocks, the adversary can only see the decrypted 
message starting from the I + 1 block. We now define the notion in detail. 

We use the ideas for the model of security against blockwise-adaptive chosen- 
plaintext attacks [15,12] and will extend this model to capture on-line chosen- 
ciphertext attacks. An adversary A runs in several stages. In all stages A is given 
the encryption oracle and the on-line decryption oracle, which we define below. In 
the first stage findi the adversary outputs two messages Mo[l], Mi[l] of a block- 
length (we will refer to such messages as to message-blocks), a state information 
s it wants to preserve and a flag done indicating whether these are the last 
message-blocks it wants to output (in which case done = 1). A challenge bit b 
and coin tosses R are chosen at random. In the next stage find 2 an encryption 
of Mf,[l] is computed using R and given to A along with s. A can continue 
outputting more message-block pairs Mo[i], Mi[i] for i = 2, . . . which can be a 
function of previously seen information, and updating s. Each time A receives a 
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current challenge ciphertext C which is an encryption of Mb[l]|| . . . ||Mh[t] using 
coins R. This process continues until A sets done = 1. At the last guess stage A 
is given the whole challenge ciphertext and s and has to guess the challenge bit 
b. Here is the formal definition. 

Definition 1. Fix d,n G N. Let OSS = be an on-line symmetric 

encryption scheme with MsgSp = Dd^n- Let A be an adversary that has access 
to two oracles. For b G {0, 1} define the following experiments: 

Experiment 

K ^JC] A A Coinsf: 

% — 1 5 C — s 5 5 — s 
Repeat 

{Mo[i],Mi[i],s, done) ^ C, s) 

C ^SK{Mb[l]\\Mb[2]\\...\\Mi,[i];R) 
z ^ z + 1 
Until done = 1 
b ^ (•) (guess, C, s) 

Return b 

It is mandated that A outputs two message blocks of block-length n in each 
find stage and the total number of find stages is less or equal to d. The on-line 
decryption oracle OVk{-) takes inputs of length In for some I £ N. In stage 
findi on input C it returns M = T>k{C'). At next stages it returns M[fc]||M[A:-|- 
I]|| . . . where M = T>k{C) and I < A: < Z is the smallest index such that 

C'[k] yf C[k] (where C is the challenge ciphertext given to A at this stage). 

We define the advantage of the adversary A as follows: 

Adv“-“ = = 0] - Pr[Exp“^-i('l“-i = 0]. 

For any t, bi'eiQd, Fd we define the advantage of OSS via 

A j ind-blk-cca /i \ I a j ind-blk-cca | 

(t,^t,<7e,AZe,<7d,Md) = max| 

where the maximum is over all A having time-complexity t, outputting message 
blocks during all find stages of total length at most Ft in all find stages, making 
at most Pe queries to the encryption oracle, whose total length is at most Fe, and 
making at most qd queries to the on-line decryption oracle, whose total length 
is at most Fd- 

We say that on-line symmetric encryption scheme OSS is secure against 
blockwise-adaptive chosen- ciphertext attacks (or simply IND-BLK-CCA secure) 
if the function ccaQ gj-^^g 3 ]^owly. □ 

Let us briefly recall why we do not explicitly allow the adversary to query 
its on-line decryption oracle in “blockwise-adaptive” manner. The reason is that 
we could do it, but this would not add any power to the adversary. This is 
because unlike encryption, the decryption algorithm is deterministic. Therefore, 
to see the decryption of a ciphertext C = C'[1]|1C[2]|| . . . \\C[l] block-by-block, 
our adversary can simply query its decryption oracle on C[l], C)!] ||C[2], . . ., 
C[l]\\C[2]\\...\\C[l]. 
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4 Analyses of Several Online Encryption Schemes 

It was suggested in [15] that the fully parallelizable encryption modes such 
as counter mode CTR are not subject to blockwise-adaptive chosen-plaintext 
attacks. It is not hard to see that CTR is subject to a straightforward blockwise- 
adaptive chosen-ciphertext attack. An adversary can just choose two distinct 
one-block messages, flip a bit in the challenge ciphertext, query the result to 
the on-line decryption oracle and it will get back the one of the two challenge 
messages with a flipped bit in it. Thus the adversary can always win its game. 

In [12] the authors prove that CTB encryption scheme [18] and VCBC, the 
modification of CBC encryption scheme, are on-line encryption schemes secure 
against chosen-plaintext blockwise-adaptive attacks. We show that these schemes 
are unfortunately insecure against blockwise-adaptive chosen-ciphertext attacks. 
This is consistent with the design goal of these schemes which did not include 
chosen-ciphertext attack resistance. The attacks are simple and are similar to 
the one on CTR mode. We define the schemes and present the attacks in [8]. 

In the next section we propose schemes which are IND-BLK-CCA secure. 



5 Proposed Schemes and Their Security 

5.1 Online-Cipher-Based Online Encryption Schemes 

It is suggested (without a proof) in [15] that the encryption scheme based on 
HPCBC on-line cipher [1] does not seem to be vulnerable to blockwise-adaptive 
chosen-plaintext attacks. We generalize, strengthen and formally justify this sug- 
gestion. We propose an encryption scheme based on any strong on-line cipher 
and formally prove that it resists not only blockwise-adaptive chosen-plaintext 
attacks but also blockwise-adaptive chosen-ciphertext ones. 

It is shown in [1] that the use of on-line ciphers provide security against 
standard chosen-plaintext attacks, if the plaintext space has appropriate pro- 
perties, namely, if the first block of a plaintext is a random string. We show that 
any secure strong on-line cipher applied to messages with a random first block 
(or to any message with prepended random block) is also an on-line symmetric 
encryption scheme secure against blockwise-adaptive chosen-ciphertext attacks. 



Construction 1. Let n, d be integers, and let F: Keys{F) x Dd,n — >■ Dd^n be an 
on-line cipher. We associate to them the following symmetric encryption scheme 

ocs£ = {1C, £,vy. 



Algorithm /C 
K A Keys{F) 
Return K 



Algorithm £k{M) 

{ 0 , 1 }" 

X ^ i?||M 
C ^ F{K,X) 
Return C 



Algorithm Vk{C) 

X ^ F~^{K, C) 

Parse X as i?||M with |i?| = n 
Return M 



Given the fact that F is an on-line cipher, it is easy to see that OCS£ is on-line 
encryption scheme. We want to show that it provides IND-BLK-CCA security 
when F is an on-line cipher secure against chosen-ciphertext attacks (a strong 
on-line cipher). The following theorem states the result. 
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Theorem 1. Let d,n be integers, and let F: Keys{F) x Dd^n — >■ Dd^n be an 
n-on-line cipher. Let OCS£ = {K.,£,T>) be the on-line symmetric encryption 
scheme associated to it as per Construction 1. Then 



Adv“--(i, Pit, Qe, hie, Qd, tid)<2- C M'e, dd, l^d) 

where 



de + Qd 

2"-i ’ 



2 

t' = t+ 0{qe,qd,ht),de = de+ ^ , h'e = he + Uqe + 

2n 8n 



6/tt 



Proof. Let A be an adversary attacking the OCS£ scheme against blockwise- 
adaptive chosen-ciphertext attacks. We construct a distinguisher B attacking 
the pseudorandomness of on-line cipher F against chosen-ciphertext attacks. B 
has access to two oracles: g, which is either a random instance of the on-line 
cipher F, or a random instance of OPerm^.n, and g~^ , the inverse of g. B runs 
A as a subroutine and has to answer its oracle queries and provide the necessary 
inputs. We present pseudocode for B followed by some explanations and the 
analysis. 

Adversary 

R A Coinsf: ; & A {0, 1} 
i ^ — 1 5 C ^ — s 5 s ^ — £ 

Repeat 

Run A on input (findi, C, s) 

When A outputs {Mtj[i\, Mi[i], s,done), do 
C^g{R\\Md[l]\\Mi,[2]\\...\\Md[i\) 
i i+\ 

Return (findijC, s) to A 
On A’s encryption query M, do 
R' A Coins£ 

Y^g(R'HM) 

Return T to A 

On A’s on-line decryption query C', do 

X <— g~^(C'), parse X as R”\\M' where \R"\ = n 
I ^ \M'\/n 

Return -|- 1]|| • • • ||M'[^]) to A, 

where k is the smallest index such that C'[k] yf C[k] 

Until done = 1 

Run A on input (guess, C, s) replying to its oracle queries as before 
until it halts and returns b 
If (b = b) then return 1 else return 0 Endlf 



B picks a random bit b and a random n-bit string R and runs the adversary 
A answering its oracle queries and providing all the necessary inputs for it. 
When A outputs two messages block- by-block during find stages, each time B 
returns the current challenge ciphertext. The challenge ciphertext is computed 
by prepending R to the message determined by the bit b, and querying it to B's 
own oracle g. Similarly, B uses its oracles g,g~^ to answer A’s encryption and 
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on-line decryption oracle queries. This is possible since encryption and decryp- 
tion algorithms use only oracle access to F{K, •). A little extra care is required 
when answering A’s on-linedecryption oracle queries. According to the experi- 
ment defined in Definition 1, B returns to A only the message 

blocks starting from position at which the query block is different from the latest 
challenge ciphertext given to A. Finally, when A outputs its guess at the end of 
guess stage, B compares it with the bit b, and if they are equal, B concludes that 
g is a random instance of on-line cipher and outputs 1, otherwise, it outputs 0 
to indicate its guess that g is a random on-line permutation. 

We now proceed to the analysis. First consider the case when g is a, random 
instance of the on-line cipher F. We claim that A’s view in the simulated expe- 
riment is exactly as in experiment By the construction B will 

return 1 if and only if the adversary A can guess the random bit b correctly. 
Thus we have: 



Pr 



F ■. B^’'- 



= 1 



= 1 . Pr[Exp“r"-° = 0] 
+ i.(l-Pr[Exp“r'^-i 
= i + i.Adv“-“(A). 



0 ]) 



( 1 ) 



Next consider the case when g is a random instance of OPertrid^n. We claim that 



Pr 



g A OPertTid,™ 



' = 1 



1 

< - 
- 2 



ge + qd 

2 ” 



(2) 



After subtracting Equation (1) and Equation (2) and taking maximum we 
get the statement of the theorem. Due to lack of space we provide the proof of 
Equation (2) and check the resources used by the adversary B in [8]. □ 



5.2 HPCBC Encryption Scheme 

We present the TFPCBC encryption scheme, based on the HPCBC on-line cipher 
proposed in [1]. According to Construction 1, the encryption algorithm applies 
the HPCBC on-line cipher to an input message prepended with a random block. 
Here are the details of the scheme. 

Let d, n G N, let E: Keys{E) x {0, 1}” — >■ {0, 1}” be a block cipher family and 
let FI-. Keys{H) x {0, 1}^" — >■ {0, 1}” be a family of functions. The message space 
of TFPCBC is n and the scheme consists of the following three algorithms: 
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Algorithm JC 

A Keys{E) 
K 2 ^ Keys{H) 

k^k4K2 

Return K 



Algorithm Sk{M) 

Parse K into K1WK2 
Parse M into n-bit blocks 

R A {0, 1}" ; M[0] ^ R 
0^" ; y ^ H{K2,T) 
p ^ y © M[o] 

C[l]^ E(Ki,P)®Y 
For i = 2, + 1 do 
T M[i - 2]\\C[i - 1] 
Y^H{K2,T) 

p ^ y © M[i - 1] 
c[{\ ^ E{Ki,P)®Y 
EndFor 

C^C[1]||...||C[Z + 1] 
Return C 



Algorithm T>k{C) 

Parse K into Ki'\K 2 
Parse C into n-bit blocks 
C[l]\\...\\C[l + l] 

T^O^"; Y ^ H{K2,T) 

P ^ E-^{Kt,{C[l]®Y)) 

M[0] ^ y © P 
For i = 1, . . . / do 
T M[i - l]\\C\i] 
Y^H(K2,T) 

P ^ E-\Ki,C[i + l]®Y) 
M[i] ^ y © P 
EndFor 

Return M 




Fig. 1. The HPCBC encryption scheme 



The 'HVCBC encryption scheme is depicted in Figure 1, from which it is easy 
to see that the scheme is on-line. The scheme makes one use of a keyed hash 
function in addition to that of a block cipher per each input block. HPCBC cipher 
has been proved to be a strong on-line cipher (secure against chosen-ciphertext 
attacks) [1] assuming E is a strong PRP family and H is almost-xor-universal 
function family. Hence, it follows from Theorem 1 that TCVCBC is IND-BLK- 
CCA secure. The concrete security result can be easily obtained from the result 
of [1] and from Theorem 1. 
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Abstract. In this paper, we present related- key slide attacks on 2-key 
and 3-key triple DES, and related-key differential and slide attacks on 
two variants of DESX. First, we show that 2-key and 3-key triple-DES are 
susceptible to related-key slide attacks. The only previously known such 
attacks are related-key differential attacks on 3- key triple-DES. Second, 
we present a related- key differential attack on DESX-f, a variant of the 
DESX with its pre- and post-whitening XOR operations replaced with 
addition modulo 2®^. Our attack shows a counter-intuitive result, that 
DESX-b is weaker than DESX against a related-key attack. Third, we 
present the first known attacks on DES-EXE, another variant of DESX 
where the XOR operations and DES encryptions are interchanged. Fur- 
ther, our attacks show that DES-EXE is also weaker than DESX against 
a related-key attack. This work suggests that extreme care has to be 
taken when proposing variants of popular block ciphers, that it is not 
always newer variants that are more resistant to attacks. 



1 Introduction 

Due to the DES’ small key size of 56 bits, variants of the DES under multiple 
encryption have been considered, including double-DES under one or two 56-bit 
key(s), and triple-DES under two or three 56-bit keys. Another variant based on 
the DES is the DESX [9]. 

In this paper, we consider the security of 2-key and 3-key triple-DES against 
related-key slide attacks, and the security of DESX variants against both related- 
key slide and related-key differential attacks. We point out that our results on 
the DESX variants do not invalidate the security proofs of [9,10], but serve to 
illustrate the limitations of their model. In particular, we argue that one should 
also consider a more flexible model that incorporates related-key queries [1,7,8]. 



1.1 Our Model 

Related-key attacks are those where the cryptanalyst is able to obtain the en- 
cryptions of certain plaintexts under both the unknown secret key, K, as well 
as an unknown related key, K' whose relationship to K is known, or can even 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 15-24, 2004. 
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be chosen [1,7,8]. Most researchers consider related-key attacks as strictly the- 
oretical and which involves a strong and restricted attack model. However, as 
has been demonstrated by several researchers such as [7,8], some of the current 
real-world cryptographic implementations may allow for practical related-key 
attacks. Examples of such instances include key-exchange protocols and hash 
functions, details of which we refer the reader to [7,8] . 

1.2 Outline of the Paper 

We briefly recall previous attacks on variants of triple-DES and DESX in Section 
2. In Section 3, we present our related- key slide attacks on 2-key and 3-key 
triple-DES. We then present in Section 4 related- key attacks on DESX-I- [9], 
a variant of DESX that replaces the pre- and post-whitening XOR operations 
with additions modulo 2®'^; and DES-EXE [6], a DESX variant with its outer 
XOR operations interchanged with the inner DES encryption. We show that 
these variants are weaker than the original DESX against related-key attacks. 
We conclude in Section 5. 

2 Previous Work 

We review in this section, previous attacks on variants of triple-DES and of 
DESX. 

Two-key triple-DES can be broken with a meet-in-the-middle (MITM) attack 
requiring 2®® chosen-plaintexts {CPs), 2®® memory and 2®® single DES encryp- 
tions [12]. There is also an attack by van Oorschot and Wiener [13] that requires 
m known-plaintexts (KPs), m words of memory and approximately ™ 

single DES encryptions. For m = 2®®, the number of encryptions is roughly 2^^^. 

Meanwhile, the most basic attack on three-key triple-DES is the MITM at- 
tack which requires 3 chosen plaintexts, 2®® memory and 2^^^ single DES en- 
cryptions. In [11], Lucks proposed an attack that requires 2®^ known plaintexts, 
2®® memory and roughly 2®®® single DES encryptions. There is also a related-key 
differential attack by Kelsey et. al [7] that works with one known plaintext, one 
related- key chosen ciphertext {RK-CC), and 2®® single DES encryptions. ® 

As for DESX, Daemen proposed an attack [5] requiring 2®^ chosen plaintexts 
and 2®® single DES encryptions, or 2 known plaintexts and 2^^® single DES 
encryptions. Meanwhile, another attack by Kilian and Rogaway [9,10] requires 
TO known plaintexts and 2®®®“*°®^ ™ single DES encryptions. For to = 2®^, the 
number of encryptions is roughly 2®®®. By making use of related-key queries, 
Kelsey et. al [8] demonstrated an attack that requires 2® related-key known 
plaintexts (RK-KPs) and 2®^® single DES encryptions. Recently, Biryukov and 

® As pointed out by an anonymous referee, our estimates are independent of the mem- 
ory access time in contrast to the approach taken in [14], and hence we assume no 
difference between memory with slow access and memory with intensive access. Such 
a general approach has been adopted in this paper to maintain uniformity with other 
previous results. 
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Wagner [4] presented a more efficient attack requiring 2^^-® known plaintexts, 

232.5 

memory and 2®^-® single DES encryptions. 



3 Related-Key Slide Attacks on Triple-DES 

3.1 Attacking 3-Key Triple-DES 

We first consider the three- key triple-DES, which was attacked by a related- 
key differential attack in [71. We denote such an encryption of P under key 
K = {K,,K 2 ,Ks) by: 

C = Ek,{E],1{EkAP)))- ( 1 ) 

If we also obtain the three- key triple-DES decryption of another plaintext, 
P' = Eki(P) under a related key K' = {Ki, K^, K 2 ), we will get the situation 
as shown in Fig. 1. 




Fig.l. Sliding- with-a- twist on 3-key triple-DES of the form EkiEj^^Ek^ 

We have in essence aligned the encryption, Ek^ o E]^^ o Ek^ under key K, 
with the decryption, E~^^^ o Ek^ o E]^^ under key K' in a sliding with a twist [4] 
style. The plaintexts, P and P' , and the ciphertexts, C and C are hence related 
by the following slid equations: 



C = EkAP) 


( 2 ) 


p' = E-^\{C) 


( 3 ) 



Our related-key slide attack works as follows: 

1. Obtain 2®^ known plaintexts, P encrypted with three-key triple-DES under 
the key, K = (Ai, A 2 , A 3 ) 

2. Obtain another 2®^ known ciphertexts, C decrypted with three- key triple- 
DES under the key. A' = (Ai, A 3 , A 2 ). Store the values of {C',P') in a table, 
Tl. By the birthday paradox, we would expect one slid pair (P, C) and (P', C) 
such that the slid equations (2) and (3) are satisfied. 

3. Guess all 2®® values of Ai and do: 

(i) Partially encrypt all 2®^P under the key, Ai. 

(ii) Search through Tl for a collision of the 1st element with the result of (i). 
Such a collision satisfies the slid equation in (2). 

(iii) For such a collision, partially decrypt C under K\ and check for a collision 
of this result with the 2nd element of Tl. The latter collision satisfies the slid 
equation in (3). 
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The first step requires 2^^ known plaintexts while Step 2 requires 2^^ related- 
key known ciphertexts and 2^^ x 2 = 2^^ words of memory. Step 3 requires 

256 

X 2^^ = 2®® single DES encryptions, and no memory. To summarize, we 
have an attack on three-key triple-DES that requires 2®^ known plaintexts, 2®^ 
related-key known ciphertexts (RK-KCs), 2®® words of memory and 2®® single 
DES encryptions. 

We note that a similar attack also applies to the case of three-key triple-DES 
of the form: 

C = EkAEkAEkAP)))- ( 4 ) 

In this case, instead of sliding an encryption with a decryption, we slide 
two encryptions, one under the key K = (Ki, K 2 , K 3 ) and the other under 
K' = (X 2 , K 3 , Ki), and obtain the situation as shown in Fig. 2. 




Fig. 2. Sliding 3-key triple-DES of the form EK 1 EK 2 EK 3 



3.2 Attacking 2-Key Triple-DES 

Two-key triple-DES is also vulnerable to a related-key slide attack. We slide 
an encryption under the key K = (Ai, A 2 ), with a decryption under the key 
K = (A 2 , Ki). We then have the situation in Fig. 3. 




Fig. 3. Sliding- with-a-twist on 2-key triple-DES of the form EkiE^^^Eki 
We thus obtain the slid equations: 



C^ = EkAP) ( 5 ) 

P' = E],l{C) (6) 

The attack follows: 

1. Obtain 2®^ known plaintexts, P encrypted with two-key triple-DES under the 
key, A = (Ai,A 2 ). 

2. Obtain another 2®^ known ciphertexts, C" decrypted with two- key triple-DES 
under the key, K' = {K 2 , Ki). Store the values of (C", P') in a table, Tl. By the 
birthday paradox, we would expect one slid pair (P,C) and {P',C') such that 
the slid equations (5) and (6) are satisfied. 

3. Guess all 2®® values of Ki and do: 
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(i) Partially encrypt all under the key, Ki. 

(ii) Store {Ek^{P)jC,Ki) in another table, T2. 

4. Search through T1 and T2 for collisions in the first element, which immediately 
reveals the corresponding key, Ki. With 2®® x 2®^ = 2®® entries in T2, and a 
probability of 2“®"* for a collision to occur, we expect 2®® x 2“®'^ = 2^"^ values 
of Ki to be suggested, and 2^^ {Eki{P),C, Ki) entries in T2 to survive this 
filtering. 

5. For all 2^"* remaining values of Ki, guess all 2®® values of K 2 and do: 

(i) Partially decrypt Eki{P) under the guessed key, K 2 - 

(ii) Further encrypt the result under Ki, and verify if the result is equal to C. 
The correct K = (Ki,K 2 ) should satisfy this due to (5). Repeat with another 
plaintext-ciphertext pair if necessary. 

The first step requires 2®^ known plaintexts while Step 2 requires 2®^ related- 
key known ciphertexts. Step 3 requires 2®® x 2®^ = 2®® single DES encryptions, 
and 2®® x 3 « 2®® ® words of memory. Step 4 is negligible while Step 5 requires 
2^® X 2®® X 2 = 2®® single DES encryptions, and no memory. To summarize, 
we have an attack on two-key triple-DES that requires 2®^ known plaintexts, 
2®^ related-key known ciphertexts, 2®® ® words of memory and 2®® single DES 
encryptions. 



4 Related-Key Attacks on DESX Variants 



DESX encryption is denoted by: 



C = Kb(BEK{P(BKa). (7) 

In this section, we will present related-key attacks on two DESX variants, namely 
the DESX-k [9] and the DES-EXE [6]. 

4.1 An Attack on DESX-|- 

It was suggested in [9] to replace the XOR pre- and post-whitening steps in DESX 
by addition modulo 2®®, to obtain the DESX variant which we call DESX-I-, 
denoted by: 



C = Kb + EK{P + Ka) (8) 

where -I- denotes addition modulo 2®®. We show here that this variant can be at- 
tacked by a related-key attack. The key observation is that if we obtain the 
DESX-I- encryption of P under key, K = {Ka,K,Kb), and also obtain the 
DESX-I- encryption of P under key K' = {Ka, K, K'^ = {Ka, K, Af, © A), where 
Kt © = A is any arbitrary known difference, then the two encryptions are 

related pictorially as in Fig. 4. 
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Ek 

Ek 



Fig.4. Related-key differential attack on DESX-I- 

Here, +Ka denotes addition modulo 2®^ with Ka- Notice that we started 
off with the same plaintext, P, and the similarity between the two encryptions 
remains until just before +ic^. 

Based on this observation, our related-key differential attack is given by: 

1. Obtain the DESX-I- encryption of P under key, K = {Ka, K, K^), and denote 
that as C. 

2. Obtain the DESX-I- encryption of P under key K' = {Ka,K,K'^) = 
{Ka, K, Kb © A), and denote that as C . 

3. Guess all 2®^ values of K^, and do: 

(i) Compute X = C — Kb- 

(ii) Compute X' = C — K{^. 

(iii) li X = X' , then the guessed Kb could be the right key value. The right 
key value would always satisfy this condition, whereas a wrong key value would 
satisfy this only with some probability, hence the number of possible values of 
Kb is reduced. Wrong key values can be easily checked against a trial encryption 
in the second analysis phase. 

We have implemented this attack on a scaled-down generalization of DESX+, 
which we call FX32+, whose ciphertext, C is defined as: 

C = Kb + FK{P+Ka) (9) 

Here, F denotes a random function, and P,C,Ka,Kb, and K are all 32 bits 
instead of 64. The execution takes just less than 1 minute on a Pentium 4, 
1.8GHz machine with 256MB RAM, running on Windows XP. The correct Kb 
value is always suggested, while the number of wrong key values suggested ranges 
from 0(1) to 0(2®^), depending on the hamming weight of the key difference, 
A. The higher the hamming weight, the more efficient the filtering of wrong 
key values. An anonymous referee remarked that as the only difference between 
XOR and modulo addition lies in the carries, and that if the addition with Kb 
generates no carries, the attack on DESX+ would not work since in that case 
modulo addition would be the same as XOR. This possible complication can be 
overcome by using a A with a large hamming weight, or by repeating the attack 
with different plaintext-ciphertext pairs. 

Once Kb is obtained in this way, the remaining keys Ka and K can be 
obtained from exhaustive search of 2^^® single DES encryptions. But we can do 
better than that. We use Kb to peel off the +Kt, operation, and apply a basic 
MITM attack on the remaining cipher that requires 2®® words of memory and 2®® 
DES encryptions [12]. Alternatively, we could reverse the roles of the plaintexts 
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and ciphertexts, and repeat our attack to recover Ka in a similar way. What 
remains is then a single DES which can be attacked by exhaustive key search of 
2^® values. 

The main bulk of this attack is step 3, requiring 2®^ x 2 = 2®® modulo 
subtractions, which is negligible, so most of the work needed lies in the exhaustive 
key search of the remaining keys or an MITM attack on the remaining double- 
DES. 

In summary, we have a related-key differential attack on DESX-I- that re- 
quires 1 known plaintext, P encrypted under the secret key, K and related key, 
K' , and 2®^® single DES encryptions. The work complexity is similar to the at- 
tack on DESX in [8], but the text complexity is much less. Alternatively, our 
attack could work with the same text complexity but with 2®® words of memory 
and 2®® single DES encryptions. In this case, when memory is available, then 
both the text and work complexities are much less than those in [8]. 

Ironically, the original DESX with XOR for pre- and post-whitening is re- 
sistant to this attack. Therefore, this is the first attack for which the original 
DESX is stronger than the DESX-I- . This is counter-intuitive since the common 
belief is that the XOR operation is weaker than modulo addition. ^ 



4.2 Attacks on DES-EXE 

In [6], the authors posed the question of whether, DES-EXE, a DES variant of 
the form: 



C = Ek,{K (B EkAP))) (10) 

would be stronger or weaker than DESX. Note that the DES-EXE is simply 
the DESX with its XOR operations in the pre- and post-whitening stage inter- 
changed with the DES encryption in the middle. 

Consider a key, K = {Ka, K, Kb), and a related key, K' = {Kb, K, Ka). Then, 
the encryptions under these two related keys could be slid as shown in Fig. 5. 




Fig.5. Sliding DESX-EXE 
Therefore, we have the slid equations: 



P' = EkAP)®K, 


(11) 


E^i(C") = C®K. 


(12) 



^ Except in the work by Biham and Shamir [2,3] that showed how replacing XOR with 
addition in certain locations in the DES can significantly weaken the DES. 
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XORing (10) and (11), we obtain: 

P' ®E^I{C') = EkSP)®C. (13) 

A related-key slide attack proceeds as follows: 

1. Obtain 2^^ known plaintexts, P encrypted with DES-EXE under the key, 
K = {Ka,K,Kb). 

2. Obtain another 2^^ known plaintexts, P' encrypted with DES-EXE under the 
key, K = {Kb, K, Ka). Store the values of (P', C) in a table, Tl. By the birthday 
paradox, we would expect one slid pair (P, C) and (P', C) such that the slid 
equations ( 10 ) and ( 11 ), and hence ( 12 ) are satisfied. 

3. Guess all 2®® values of Ka, and do: 

(i) Compute Ek^{P) © C for all {P,C) and store {Ek^P) ® ^ 1 ^ 0 ) in a table, 
Tl. 

(ii) Compute P' © E]^^^{C') for all {P',C') and store (P' © E]^^{C'),Ka) in a 
table, T2. 

4. Search through Tl and T2 for a collision in the first entry, which immediately 
reveals the key, Ka- 

The remaining keys can be obtained via exhaustive search, or we could use Ka 
to peel off one layer and apply an MITM attack on the remaining two layers 
requiring 2®® words of memory and 2®® DES encryptions. 

Step 1 requires 2®^ known plaintexts while Step 2 requires 2®^ related-key 
known plaintexts. Step 3 requires 2®® x 2®^ x 2 = 2®® single DES encryptions, and 

288 

X 3 X 2 « 2®® ® words of memory. Step 4 is negligible. Meanwhile, exhaustive 
search of the remaining keys requires 2®® x 2®"^ = 2^®® single DES encryptions, or 
an alternative MITM attack requires 2®® memory and 2®® DES encryptions. To 
summarize, we have an attack on DES-EXE that requires 2®® known plaintexts, 
2®® related-key known plaintexts, 2®® ® words of memory and 2®® single DES 
encryptions. 

A better attack works by observing that if we obtain the encryption, C of 
a plaintext, P under the key K = {Ka, K, Kb), and subsequently obtain the 
decryption of C under the key K' = {K'a, K, Kb) = {Ka © A, K, Kb) where A is 
any arbitrary known difference, then we get the situation as indicated in Fig. 6 . 



P 



P' 



Ek^ 




0x 




Ek^ 


Ek'^ 




®K 




EKt 



Fig. 6 . Related-key differential attack on DESX-EXE 

Here, (Bk denotes an XOR operation with the key, K. The following relation 
then applies: 



K = E],\{EkAP))- 



(14) 
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Table 1. Comparison of Attacks on Triple-DES Variants 



Block Cipher 


Texts 


Memory 


DES Encryptions 


Source 


2-key Triple-DES 




2bb 


2bb 


[12] 


2-key Triple-DES 






2™ 


[13,14] 


2-key Triple-DES 


2^'-^KP,2^'^RK-KC 




2®® 


This paper 


3-key Triple-DES 


3 CP 




2™ 


[12] 


3-key Triple-DES 


1 KP, 1 RK-CC 




2®® 


[7] 


3-key Triple-DES 


2-^-^KP 




2™ 


[11] 


3-key Triple-DES 


2^'-^KP,2^'^RK-KC 


2^^ 


2®® 


This paper 



Table 2. Comparison of Attacks on DESX Variants 



Block Cipher 


Texts 


Memory 


DES Encryptions 


Source 


DESX 


2^'^ CP 


- 


2®« 


[5] 


DESX 


2 KP 


- 




[5] 


DESX 


2^'^KP 


- 




[9,10] 


DESX 


2'^RK-KP 


- 




[8] 


DESX 


2^'^-^KP 




2»v.b 


[4] 


DESX+ 


1 RK, 1 RK-KP 


- 




This paper 


DESX+ 


1 RK, 1 RK-KP 


2®® 


2®® 


This paper 


DES-EXE 


2^'-^KP,2^'^RK-KP 


2^DT5 


2®® 


This paper 


DES-EXE 


1 KP, 1 RK-CC 


2®® 


2®® 


This paper 



For all 2^® values of Ka, check that (13) satisfies, and Ka can be recovered 
with 2®® encryptions. Use this to peel of the first layer, and apply an MITM at- 
tack on the remaining two layers, requiring 2®® memory and 2®® encryptions [12]. 
In summary, we require one known plaintext, one related- key chosen ciphertext, 
2®® words of memory and 2®® single DES encryptions. This shows that DES-EXE 
is much weaker than the original DESX against a related-key differential attack. 



5 Conclusions 

We have presented related- key slide attacks on 2- key and 3- key triple-DES. Our 
attacks are the first known related-key slide attacks on these triple-DES variants. 

We have also presented attacks on DESX variants. In particular, we showed 
that contrary to popular belief, the DESX-k, a DESX variant that uses addition 
modulo 2®'^ for its pre- and post-whitening, is weaker than DESX against a 
related-key differential attack. Our attacks on DES-EXE, another DESX variant 
with the outer XOR operations interchanged with the middle DES encryption, 
also show that DES-EXE is much weaker than the original DESX against related- 
key attacks. In Tables 1 and 2, we present a comparison of our attacks with 
previous attacks on variants of triple-DES and DESX. 
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Abstract. Recently, Barkan and Biham proposed the concept of dual 
ciphers and pointed out that there are 240 dual ciphers of AES (Dual 
AES). An interesting application of dual ciphers is to design a cipher 
which run faster than the original cipher. In this paper, we first general- 
ize the dual AES and propose a complete setup procedure to determine 
all dual ciphers. Then, a hardware implementation of AES based on the 
combination of dual cipher and composite field is proposed. We demon- 
strate that our AES design not only offers better performance and smaller 
area requirement than the design proposed by Wolkerstorfer et al which 
uses a composite field only. Our results confirm Barkan et al.’s conjecture 
that it is possible to design an AES cipher more efficiency than ever. 



1 Introduction 

In October 2000, NIST (National Institute of Standards and Technology) se- 
lected the Rijndael algorithm as the AES (Advanced Encryption Standard) [1] 
to replace the original encryption standard DES. The AES is expected to be the 
standard for the next 30 years. Since then, both software and hardware imple- 
mentations of AES are hot issues in the literature [3,4,5,6,7,8,9,10,15,16]. Some 
papers [16] mentioned about how to improve software efficiency and some [3,4, 
5,6,7,8,9,10,15] mentioned about hardware implementation. 

The AES is a block cipher. The four building blocks of the AES cipher 
are ShiftRows, SubBytes (S-Box), MixColumn and AddRoundKey. The inverse 
function in SubBytes offers AES the non-linear operation. The inverse function 
operates over Galois Field GF(2®). As for software implementation, however, it 
takes a lot of time to execute the inverse operation. Usually it is implemented 
with table-lookup to accelerate the execution, even though table-lookup is ineffi- 
cient for hardware implementation. Because the table-lookup is built with ROM 
(read only memory) cells, it needs a large chip area for ASIG implementation. 
How to increase the efficiency of the inverse operation is a very important aspect 
for implementation. Using combinational logic cells to replace ROM cells can re- 
duce the chip size, but unfortunately the circuitry for inversion over GF(2®) is 
complicated and it needs many logic cells to synthesize the inverse operation. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 25-38, 2004. 
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However, the inverse operation over a composite field is simpler than that of 
original [13,14]. Rudra et al [3] and Wolkerstorfer et al [4] mentioned to use a 
composite field GF((2‘*)^) to implement the SubBytes operation of AES, demon- 
strating that it is practical to implement SubBytes with combinational logic cells. 
This procedure reduces the chip size very much. 

Recently, Barkan and Biham [2] proposed the concept of dual ciphers and 
pointed out that there are 240 dual ciphers for the AES cipher. For example, let 
E denote the AES cipher, then the ciphers E, E'^,E*,E^, E^^, E^'^, and 
are the dual ciphers. The existence of these dual ciphers brings the investigation 
of the AES cipher to a wider view. However, the implementation and the speed 
of the dual ciphers is yet to be proposed. 

In this paper, we generalize the dual AES and propose a realizable setup 
procedure to determine all dual ciphers. We also discuss the hardware implemen- 
tation of the ciphers. Our results show that the combination of the dual cipher 
with a composite field can offer better speed with small area requirements than 
using a composite field only. 

In section 2, we give a generalized description of the dual ciphers followed 
by proposing of a complete procedure to find out the dual cipher. In section 
3, a model for the implementation of the original AES and the dual cipher is 
presented. In section 4, we present our design philosophy about the whole cipher. 
Section 5 is the conclusion. 

2 The Dual AES Cipher 

Barkan and Biham [2] first proposed the idea of dual ciphers of AES in 2002. 
The dual ciphers defined by Barkan and Biham are given as follows: 

Definition 1. [2] Two ciphers E and E’ are called Dual Ciphers, if they are 
isomorphic, i.e., if there exists invertible transformations f{-),g{-) and h{-) such 
that \/P, K f{Ek){P)) = E'gf^j^^{h{P)) , where P is the plaintext, and K is the 
secret key. 

If we modify the definition above as Ex(p) = /“^(E'^^^(/i(P))), we can 
express the relation between dual AES and AES as illustrated in Figure 1. We use 
the transformation T(P) to express the h{P) function, and the transformation 
T~^{C') to express the f~^{C') function, as well as T{K) to express g{K). 

2.1 Generalized Representation of AES 

AES is a block cipher system based on algebraic operations over the algebraic 
finite field GF(2®). The dual cipher can be defined using the theory of finite field. 

Definition 2. [11] A mapping f : G ^ H of the group G into group H is called 
a homomorphism of G into H if f preserves the operation of G. That is, if * 
and • are the operations of G and H, respectively , then f preserves the operation 
of G if for all a, b € G, we have f{a * b) = f{a) ■ f{b). If f is a one-to-one 
homomorphism of G onto H, then f is called an isomorphism and we say that G 
and H are isomorphic. An isomorphism of G onto G is called an automorphism. 
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P =T(P) 






AES 

C 



Dual AES 

C’=T(Q 



K’=T(K) 



Fig. 1. The relation between AES and dual AES 



This definition can be applied to mappings between rings. A mapping ip : 
R ^ S from a ring R into a ring S is called a homomorphism if for any a,b G R, 
we have 



ip{a + b) = ip{a)+ip{b), (1) 

and 

if{ab) = ip{a)ip{b). (2) 

Theorem 1. [lljThe distinct automorphisms of GF(q^) over GF(q) are exactly 
the mappings (Tq, CTi, . . • , defined by <Jj(a) = for a € GF{q"^) and 

j = 0, 1, . . . ,m - 1. 

The elements a, . . . , ^ are called the conjugates of a over GF(( 7 ). 

Over GF(2®), there are 8 elements in the set of conjugates of a, which are 
a, a®, and For each irreducible polynomial, there are 

128 primitives over GF(2®) [11]. Each 8 primitives form a set of conjugates. 
There are 16 sets of conjugates over GF(2®). 

As for AES, the irreducible polynomial representation is R{x) = x®+x^+x® + 
x+1 (hexadecimal notation is {lli?}a;). Sixteen sets of conjugates can be deter- 
mined over GF(2®). However, only one set satisfies the equations 1 and 2 when 
-I- and X operations are performed within AES. The set is {{03}a,, {05}a;, {ll}a,, 
{lA}x, {4C}x, {5F}x, {E5}x, {FB}x}. In finite field, a generator is defined as 
an element with its power terms can generate all the elements in the field. If we 
take the generator [3 = {OSjj,, we denote this AES as {GF(2®), {03}a,}, 

or simply as {{HRja,, {03}a,}. The set of 8-bits elements {01, 02, . . . , FF} is 
mapped onto the set of power representations of (3 : (0, l,/3^,/3^, . . . ,/3^®^}. So, 
we can replace the original AES in terms of the power of the generator f3 = {03}a,. 
We call this as a generalized representation of AES. 
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We can choose any conjugate to generate a dual AES. For different generators 
we have different forms of dual ciphers. Over GF(2®) there are 30 irreducible 
polynomials found [11]. Only one set of conjugates is suitable to be a generator 
for each polynomial. Since there are 8 elements in the set, there are 240 different 
forms of dual ciphers. We can denote a dual cipher of AES simply as a dual 
AES. 

In Appendix- A we list the 240 pairs of {i?(x),6} which are obtained by the 
setup procedure described in Section 2.2. 

2.2 Mapping from AES to Dual AES over GF(2®) 

The original AES {{lli?}^, {03},^,} can be mapped to a generalized AES. The 
generalized AES is the power form of the generator {03}^. This generalized AES 
can be mapped onto another dual AES with a different generator or polyno- 
mial. Both the AES cipher and this dual cipher are isomorphic. Because the dual 
ciphers must follow the equations stated in [1] and [2], they can be ascertained 
easily. 

Next, we propose a method to determine the dual AES {R{x),j3}, where j3 
is the generator and R{x) is the polynomial. Assume R{x) is known, and j3 is to 
be determined. The setup procedure is as follows: 

1. Check to see if an element a £ {{02}^, . . . , {FF}^} is primitive or not. If 
it is not primitive, delete it from the set {{02}^, . . . , {FF}^}. Repeat this 
procedure until the whole set is checked. The residual set will have 128 
elements. 

2. Pick one element from the residual set to be generator a. Choose any two 

elements and a* such that . Assume the power form 

elements a* and a* map to p and q in the dual AES, respectively. For 
p £ {{02}^, . . . , {FF}^}, check to see if the equation q = p + 1 works or 
not. If it passes the test, a generator of dual AES has been found. Proceed 
directly to step 4. 

3. Check the other conjugates , = 1, 2, . . . , 7, to see if one is generator a. If 

any one of these works, a generator has found and you can proceed to step 

4. If all of these fail, then return to Step 2 and test another one. 

4. The primitive element a can be as a generator for building a dual AES, as 

can the other conjugates , = 1,2, ... ,7. After choosing one of them as 

the generator [3 and mapping from the AES to the dual AES, we can form 
the matrix, T = [/3°, /3^’^®] is formed, where the 
/3* are binary vectors. The inverse matrix T~^, mapping from the dual AES 
to the AES, is obtained by inverting the matrix T. 

5. The overall operations of the dual AES{i?(a:), /?} are obtained as follows. 
AddRoundKey. This function keeps the same operation as the original 
AES. AddRoundKey is still a simple XOR operation of the intermediate 
state to the round key. 

ShiftRows. This function also keeps the same operation as the original 
AES. That is, ShiftRows is still cyclically shifted with the same offsets, as is 
the InvShiftRows. 
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MixColumns. The function MixColumns is the column-states multiplied 
by the 4 elements , /3° , /3° , /3] , with the polynomial denoted as + 

{1} X + {1} + {P} X^ . For the InvMixColumns, the four elements are 

[/ 3223 ^^ 199 ^^ 228 ^^ 104 ]^ 

SubBytes. The SubBytes is composed of InverseMapping, Affine Trans- 
formation and Inv Affine Transformation. The InverseMapping function of 
SubBytes and InvSubBytes keep the same operation as the original AES 
with the irreducible polynomial R{x). The Affine Transformation of Sub- 
Bytes is G'{y) = T{Const) + {T • Affine • T~^) • y , where Affine is the 
same 8x8 matrix and Const = {63}^ as the original AES. 

Using this setup procedure, we can obtained any dual AES ciphers {R{x),P}. 

3 Optimizing the SubBytes Implementation 

Among the four functions in AES, SubBytes is the most important function 
because it provides a nonlinear transformation. From the aspect of implemen- 
tation, the efficiency of AES hardware implementation is mainly determined by 
the implementation of SubBytes. In this section, we will only discuss the design 
of SubBytes. 

3.1 SubBytes Implementation of the AES Proposed by 
Wolkerstorfer et al 

The original AES applied with a composite field is discussed in [4] . The sub-fields 
and the polynomials are usually selected as follows [4]. 

(GF{2^)-.Q{y)=y^ + y+l, 

\GF{{2^f):Piz) = z^ + z+{E}^ 

where GF(2^) is composed of Q{y), and GF((2^)^) is composed of P{z). 
Most of the functionality of SubBytes architecture can be implemented using 
two-input XOR gates. SubBytes implementation efficiency can be measured in 
terms of space complexity and time complexity. 

In Table 1, the space complexity of SubBytes is measured by t|AOi?(counts of 
XOR gates) and time complexity is measured by txor (total delay accumulated 
along the critical path). The total XOR gates count is 123. The critical path for 
encryption is composed of 18 XOR-gates. For decryption, it is only 17 because 
the Inv Affine transformation has less complexity. 

3.2 The Proposed SubBytes Implementation of the Dual AES 

We can see in Figure 1 that if we want to implement the AES via the dual AES, 
then the cost of the transformation T and T~^ must be taken into consideration. 
How to lower this extra cost loss? by reconfiguring the hardware structure. In 
this section, we will only discuss the SubBytes implementation of the dual AES. 
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Table 1. Complexities of the SubBytes designed by Wolkerstorfer et al 



block 


TXOR of 
component 
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encrypt path 
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decrypt path 


ttXOR of 
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Multiplication 
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12 
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36 


Inversioon 


3 


1 


1 


12 


1 


12 


Resnlt 


Max delay (Encryption) 


18 






Max delay (Decryption) 


17 


Sum of XOR gates 


123 



The Efficient Architecture of SubBytes. Because the cipher must contain 
both encryption and decryption, the SubBytes and the InvSubBytes must be 
designed within the same chip. The InverseMapping function can be shared 
between SubBytes and InvSubBytes. In the following discussion, we simply write 
“SubBytes” to represent both the SubBytes and the InvSubBytes. The design 
criterion of the SubBytes is to minimize the cost of the components used. The 
cost for the SubBytes is the sum of the components’ cost of InverseMapping, 
Affine Transformation and Inv Affine Transformation. For the design of the dual 
AES with its composite field, the design cost of SubBytes denoted by Cs is: 

Ca = Cost{Af fine) + Cost{InvAf fine) + Cost{M) + Cost(M~^) + Cost{ConstMul) 

( 3 ) 

where the mapping matrix M denotes the mapping from GF(2®) to GF((2'^)^) 
and M~^ denotes the reverse mapping. GonstMul denotes a constant 
multiplication over GF(2^), which is required to compute the inversion. Gost 
analysis will be further discussed in Section 4. 



Our Desigu of SubBytes. Proper selection of the dual AES can minimize 
the total delay along the critical path as well as reducing the gate counts of 
the SubBytes and InvSubBytes blocks. Because there are only a total of 240 
dual AES ciphers, we were able to use exhaustive search to find the optimal 
dual AES and the composite field. In our design, the optimal fields and their 
polynomials have been selected as follows: 



GE(2®) : R{x) = + x'^ + x'^ + + 1, 

GF{2^):Q{y)=y^ + y+l, 
GF{{2^f)-.P{z) = z^ + z+{9}^. 
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The dual AES {{H-Dja:, {02}^} is used. We selected P{z) = z'^ + z + {9}^ 
as the irreducible polynomial over GF((2^)^). We describe the three functions of 
the SubBytes of our design below. 

1. AfRne Transformation. For the dual AES {{IID}^, {02}^}, the Affine 
Transformation is given as: 
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Affine Transformation can be described as in the following (where the 
notation ~ (•) denotes the invert-operation). 

b = Af fine -Tran. {c) b,c G GF(2®) 

bo 64 = Co 0 Cl © C4 

bi 65 =~ (Ci © C2 © C5) 

be =~ (c2 © C3 © Co) 

63 67 = (C3 © C4 © C7) 



2. Inv AfRne Transformation. For the dual AES {02}^}, the 

Inv Affine Transformation is given as: 



Inv Affine 
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Transformation can also be described by the following equations: 



b = Affinc-Tran. ^(c) 
bo = Co 

bi = Cl 

b2 = (C 2 ) 
bo = (co © C3) 



b,cG Gf (2«) 

64 = Co © (ci © C4) 

65 = Cl © C2 © C5 

be = (co © C3) © C2 © Co 
bv = (ci © C4 © C3 © C7) 



3. InverseMapping. The inversion is computed over the composite field 
GF((2"^)^). The InverseMapping consists of two main components: the compu- 
tation over GF((2^)^) and the mapping matrices M and M~^. 
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'1 1 0 1 0 0 0 0 ] [1 0 1 0 1 0 0 O' 

01000100 01000001 

01011001 01010000 

0 1 1 0 0 1 0 0 j^-1 _ 0 110 1001 

~ 00001001’^^ “ 00101100 

01011000 00000001 

00100110 01010011 

0000010 oj [o 0100100 

The inversion over GF( 2 ®) is computed in the composite field GF(( 2 ^)^). An 
element a G GF( 2 ®) can be represented as a two-term polynomial: 
a = tthZ + ai,whera G GF{ 2 ^), ah, ai G GF{ 2 ‘^). 

For computing the inversion, the 1 -element is given as: (atz + ai ^{a'f^z + a'i) 
= { 0 },gZ + {l}^,, where ah,ai,a'f^,a[ GF( 2 ^). The inversion can then be derived 
as: 

{ahZ + ai)~^ = a'f^z + a[ = {au ® d)z + {ai © o/j) G d 
d = {{al © { 9 },^,) © {ah © ©) © af)~^ 

For the element a G GF( 2 ®) and ah, ai € GF( 2 ‘*), the matrix M represents 
the transfer from a to ahZ + ai. The mapping function of the matrix M is given 
by the function Mapping(-): 

ahZ + ai = Mapping(a) ah, ai G GF{ 2 “^),a G GF( 2 ®) 

0/0 = oq © (oi ® 03) o/io = (04 © 07) 

0/1 = Oi © 05 ahl = (oi © 03) © 04 

0/2 = (oi © 03) © (04 © ar) Uh2 = (oo © 05) © 05 

0/3 = Oi © (02 © O5) 0/i3 = O5 

The inverse of o G GF( 2 ®) can be replaced by the inverse of ahZ + ai, where 
ah,ai G GF( 2 ‘^). The computing of the inversion over GF(( 2 '^)^) requires 1 in- 
version, 3 general multiplications, 2 squarings, 3 additions and 1 constant mul- 
tiplication. All of these operations are expressed in the equations below. 

A. Inversion. The inverse 0“^ of an element o G GF( 2 ^) can be derived 
by solving the equation a ■ q mod Q{y) = 1 , where q G GF( 2 ^). The solution is 
given as follows: 

q = a~^ mod Q{y) o, <7 G GF{ 2 ‘^) 

do = Oq © Oi © 02 © OQ02 © Oi02 © 000102 © O3 © 0i0203 
dl = OqOi © OQ02 © Oi02 © O3 © Oi03 © 090103 
(72 = OoOi © 02 © Oo02 © 03 © OQ03 © 0o0203 
d3 = Ol © O2 © O3 © Oo03 © Oi03 © O2O3 © 010203 

B. Multiplication. Multiplication over GF( 2 ‘*) is given by: 

q = a®b = a® b mod Q{y) a, b, q G GF{ 2 '^) 

qo = aobo © 03&1 © 0262 © 0163 

qi = aibo © (oo © 03)61 © (02 © 03)62 © (oi © 02)63 

d2 = 0260 © O161 © (oo © 03)62 © (02 © 0363 

d 3 = O360 © O261 © O162 © (0903)63 
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C. Squaring. The operation of squaring is given by: 

q = a? mod Q{y) a, q G GF(2'^) 
go = ao © 02 qi = 02 

02 = Oi © 03 53 = Oo 

D. Multiplication by constant {9}a;. Multiplication by {9}x is required 
in the inversion computation. The operation for this constant multiplication is 
given as: 

q = o© 9x mod Q{y) a, q G GF(2‘*) 
go = Oo © oi gi = 02 

02 = 03 53 = Oo 

Table 2 is the implementation complexities of the our SubBytes design, with 
the dual AES {02},^,}. Gomparing these results with those of Table 1, 

we can see that the design with dual AES is superior to that with standard AES. 



Table 2. Complexities of our SubBytesdesign 
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15/15 


Cs{txor) (E/D) 


7/7 


Sum of XOR (gates) 


101 


G%ttXOR) 


37 



The overall number of two-inputs XOR gates is 101. The critical paths for en- 
cryption and decryption are both composed of 15 XOR-gates in series. Compared 
with Wolkerstorfer et al.’s computations, the cost of XOR gates Csi^XOR) is 
reduced by 17%, and the cost of path delay Cs{txor) is reduced by 16% for 
encryption and 11% for decryption. In Table 3, we graphically compare our de- 
sign with that of Wolkerstorfer et al.’s. The cost analysis of Csi'^XOR) and 
Cs{txor) will be discussed in the next section. 

In section 3.1, we discuss the implementation of SubBytes based on the orig- 
inal AES. In this section, we discuss the implementation of SubBytes based on 
the dual AES {{llD}x, {02}a;}. Both of these use a composite field. Our findings 
indicate that implementation of SubBytes with dual AES is more efficient. 





34 



S.-Y. Wu, S.-C. Lu, and C.S. Laih 



Table 3. Comparison between our design of SubBytes and Wolkerstorfer et al.’s 





Csi^XOR) 


Cs{txOr) 

(E/D) 


Max Delay 
(Encryption) 


Max Delay 
(Decryption) 


Sum of 
XOR gates 


Wolkerstorfer et al.’s 


59 


10/9 


18 


17 


123 


Our Design 


37 


7/7 


15 


15 


101 



4 Design of AES via Dual Cipher 

If we want to design the whole AES with ASIC, there are several different possible 
architectural styles [10]. In this paper, we focus on the two most commonly used: 
iterative circuits and pipeline circuits. Different circuitry or a different choice in 
building blocks can lead to differences in space or time complexity. For simplicity, 
we will limit our analysis of design complexity to the aspect of cost. Note that 
the cost analysis of the key expansion block is not included in our discussion. 



4.1 Cost Analysis 

The mapping from the AES to the dual AES is shown in Figure 1. Now, we want 
to implement an AES via a dual AES. We must take into consideration that the 
cost of transformation T and T~^ will increase when using the dual AES. The 
cost of T and T~^ must be as low as possible. Now, let Cost((/)) denote the cost 
of operation (f). The cost of SubBytes will be denoted by Cs, which is described 
in equation [3]. 

Compare the building blocks of the dual AES with those of AES. Because 
the function of the AddRoundKey, ShiftRows and InvShiftRows blocks are the 
same, we can ignore the cost difference between the design of the AES and the 
dual AES. In MixColumns and InvMixColumns, each column-state is multiplied 
by a specific polynomial. Those functions work like multiplication by a constant 
over GF(2®). Hence, the design criterion for these functions will be to minimize 
the cost Cp, where Cp is given as: 

Cp = Cmc + CiMC, 

where C{MC) = Cost{xT{Q2)) + C'ost(xT(03)), 
and 

C(^IMC) = Cost{xT{0E)) + Cost{xT{09)) + Cost{xT{0D)) + Cost{xT{0B)) 
Here the notation x(-) denotes constant multiplication and Cmc and Cjmc 
denote the costs of MixColumns and InvMixColumns , respectively. 

We use the notation Cp to denote the sum of the cost of the transformation 
matrix T (denoted by Cost{T)) and its inverse T~^ (denoted by Cost{T~^)) as: 
Cp = (1 -b fc) X Cost{T) + Cost{T~^), 
where k denotes the cost weighing for the different length of the secret key. 
For the cost Cp for jj XOR, the value of /c is 1 for the 128-bits secret key, 1.5 for 
192-bits, and 2 for 256-bits, respectively. For the cost Cp for txor, k is equal 
to 0 because the secret key can be transferred at the same time as the plaintext 
is being transferred. 
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The total cost for AES design is given by C. This value can be expressed by: 
C' = txCr + sxCs+px Cp. 

where t, s and p denote the cost weighting of the transformation matrices, archi- 
tecture of SubBytes, and MixColumns operations, respectively. With different 
circuit architectures, the t, s and p values will change. Here we only discuss two 
frequently used architectures: the iterative circuit and the pipeline circuit. 

Iterative circuit. In this architecture, one round of cipher is imple- 
mented. //designed for hardware implementation. For an n-round cipher, the 
hardware will be used n times. The advantage of using the iterative circuit is 
the hardware’s area efficiency.On the other side, however, it must execute n times 
to encrypt or decrypt. Our design criterion is to minimize the cost C(^XOR) 
and C{txor) which are stated as follows: 

CiiXOR) = 16 X Cri^XOR) + 16xnxCs{^XOR) + 4xCp{]iXOR). 
c{txor) = n>^Cs{TxoR) + {n— 1)xCp{txor)- 

Pipelining circuit. In this architecture, each round of the cipher will be piped 
with hardware implementation. Registers between each round must be included. 
The advantage of using the pipeline circuit is the high data throughput, as 
encryption or decryption requires only an average of one clock cycle. However, 
the cost of area is also greatly increased. Our design criterion is to minimize the 
cost C{'^XOR) and C{txor)^ which are shown as follows. 

C{^XOR) = 16 X Cri^XOR) + l&xnxCs{^XOR) + 4,x{n - l)xCp{^XOR)). 
c\txOr) = Cs{txOr) + Cp{txOr) 

Here, n means the number of rounds in AES. While the length of the secret 
key 128-, 192- or 256-bits, n is 10, 12 or 14, respectively. 



4.2 The Feasibility of the Dual Ciphers’ Application 



The dual AES {{llD}x, {02}x} discussed in the last section has special trans- 
formation matrices, T and T~^ , which are the same, and are given as follows: 



T = 



'1 1 1 1 1 1 1 r 




'1 1 1 1 1 1 1 1' 


01010101 




01010101 


00110011 




00110011 


00010001 


,T-i = 


00010001 


00001111 


00001111 


00000101 




00000101 


00000011 




00000011 


00000001 




00000001 



We can express T and T ^ with the following equations: 



6 = Tran.(a)(orTran. ^(a)) 



6,cG GF( 28 ) 



60 = ao © oi © (02) © 03 © (04) © (05 © (oe) © 07) 64 = (04 © 05 © (oe © 07)) 

61 = Oi © 03 © (05 © 07) 65 = (05 © 07) 

62 = (02 © 03) © (06 © 07) be = (06 © 07) 

63 = 03 © 07 67 = 07 
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We can calculate the cost C^dlXOR) of the transformation matrices T 
and T~^ with Ct{'!!,XOR) = (1 + k)xCost{T) + Cost(T~^), thus getting: 
CtUXOR) = (1 + 1) X 12 + 12 = 36. The cost Ct^XOR) of the transfor- 
mation matrices T and T~^ can be omitted because it is not affected by this 
evaluation. 

For dual AES {{llD}x, {02}x}, the polynomial of the MixColumns is {03}^-|- 
{01}^x -I- {01}^x^ -I- {02}^x^. It is interesting to note that the coefficients of 
the polynomial are the same as those in the original AES, differing only in 
permutation. Therefore, the MixColumns’ cost of this dual AES is the same as 
that of the original AES, as is the InvMixColumns’. For AES and this dual AES, 
the cost Cp is the same. 

According to section 3 the cost Csi^XOR) and Cs{txor) of SubBytes 
are: Csi^^XOR) = 7 for encryption, Csi^XOR) = 7 for decryption, and 
Csi'iXOR) = 37. The cost CQIAOi?) and C{txor) for the whole dual AES 
design, then, is as follows: 

For the iterative circuit: For the pipelining circuit: 

C(ttAOA) = 16x36 -t 16x37 = 1168 C(#AOA) = 16x36 + 16x10x37 = 6496 
C{txor) = 10x7 = 70. {for encryption) C{txor) = 7. {for encryption) 

C{txor) = 10x7 = 70. {for decryption) C{txor) = 7. {for decryption) 

The cost formula derived above can still be used for the implementation stated 
by Wolkerstorfer et al [4]. The cost of transformation is equal to 0, because the 
implementation is done with the original AES. The cost C{^XOR) and C{txor) 
for Wolkerstorfer et al.’s design are given as: 

For the iterative circuit: For the pipelining circuit: 

C(#AOA) = 16x0 + 16x59 = 944 C{'{,XOR) = 16x0 + 16x10x59 = 9440 

C{txor) ~ 10x10 = 100. {for encryption) C{txor) = 10. {for encryption) 

C{txor) ~ 10x9 = 90. {for decryption) C{txor) = 9. {for decryption) 

Table 4 compares the results produced with our design with Wolkerstorfer et 
al.’s. Using a pipelining circuit, our design is much better than Wolkerstorfer et 
al’s in both time and space complexity. Using this architecture, we reduce the 
cost of area by 1/6 and the cost of delay by 1/4. Using the iterative circuit, 
our design has an even greater improvement in time complexity, but only slight 
increase in space complexity. 

5 Conclusion 

There are two major directions in ASIC design for AES. One is table-lookup 
and the other is with a composite held, however the table-lookup method is 
not efficient. Is it more efficient with composite field? We propose an approach 
using dual AES in combination with a composite field. In this article, we also 
present a generalization form for dual AES, and propose a method for ascer- 
taining a dual AES. Based on the theory of the finite field, the dual AES may 
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Table 4. Cost comparison of AES with different circuitry 







Our Design 


Wolkerstorefer et al.’s 


Iterative Circuit 


C(ttAOR) 


1168 


944 


C{r{E)) 


70 


100 


C{t{D)) 


70 


90 


Pipelining Circuit 


C{^XOR) 


6496 


9440 


C{r{E)) 


7 


10 


C{r{D)) 


7 


9 



have more attractive characteristics. Daemen and Rijmen called this special fi- 
nite field RIJNDAEL-GF [10]. We can see that research and application on the 
dual AES is just beginning. 
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Appendix A: 



The dual cipher of AES is denoted by the pair {R{x),f}}. All the 240 dual ciphers 
are listed as follows: 



R(x) 


Generator /3 of the dual AES {R(x), /?} I 


IIB 


03 


05 


11 


lA 


4C 


5F 


E5 


FB 
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10 


ID 
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85 
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12B 


49 
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8B 


9B 


9D 


9F 


AO 


A7 


12D 


2A 


3F 
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CC 


FO 
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88 
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5B 
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CC 


E3 


E5 
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14D 


OD 


18 


IF 
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F6 


FF 


15F 


OB 


19 


IE 


45 
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3F 


5B 


5F 
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BA 


D9 


DB 
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49 


4C 


B5 


B6 


C6 


D1 
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21 


23 


31 


32 
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C8 


CC 


171 


17 
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5C 


64 


93 


95 


AC 


B8 


177 


16 


29 


4E 


63 


C6 


D2 


EA 


EC 


17B 


26 


35 


49 


5B 


83 


94 


EB 


FD 


187 


07 


15 


37 


73 


96 


CA 


E3 


E9 


18B 


32 


3E 
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7D 


85 


BC 
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2A 


32 
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A2 


Cl 
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E7 
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14 
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8F 


AC 


CD 
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1A3 
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4F 


52 


75 
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CE 
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E8 
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FI 
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47 
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1D7 


IB 
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EF 


IDD 
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96 


A6 


B5 


1E7 
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14 


2E 
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1F3 


21 
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32 
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71 
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AB 
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Abstract. This paper analyses periodic properties of counter assisted 
stream ciphers. In particular, we analyze constructions where the 
counter system also has the purpose of providing additional complexity. 
We then apply the results to the recently proposed stream cipher 
Rabbit, and increase the lower bound on the internal state period 
length from 2'^®® to 2^^®. With reasonable assumptions we illustrate 
that the period length of Rabbit is at least the period of the counter 
system, i.e. at least 2^®® — 1. The investigations are related to a “mod 
3” characteristic of Rabbit. Attacks based on this characteristic are 
discussed and found infeasible. 

Keywords. Stream cipher, period, counter, diversity, degeneracy, Rab- 
bit 



1 Introduction 

An important problem in the construction of stream ciphers based on iterative 
number generators of the form Xj+i = f{xi) mod m is to secure that the state 
variable, Xi, does not enter into short periods. Most stream ciphers are based on 
Linear Feedback Shift Registers (LFSRs) which have provable period properties 
(see e.g. [1]). LFSRs are simple and linear and additional measures must be taken 
to ensure the security of such ciphers (see e.g. SNOW [2]). Another approach is to 
iterate a non-linear function /. However, the periodic properties then generally 
become uncertain. In [3] Shamir and Tsaban propose to use counter assisted 
generators with the iteration scheme Xi+i = f{xi) + Ci mod m, where Ci is 
the independent counter state^. If Ci yf Cj in a counter period, Nc, for all j — 
i mod Nc yf 0 then the generator will have the same period length as the counter 
system. In this construction the counter value is added before the pseudo-random 
data is extracted and the only secret part is located in the Xi variable [3]. For 
instance, by a linear combination of the output it might be possible to make the 

^ In this context denotes the usual addition modulo m. However, any Latin square 
operation can be used, see [3] for details. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 39-53, 2004. 
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counter system vanish [4]. Therefore, in some situations it might be beneficial to 
extract before the counter state is added, i.e. ccj+i = f{xi + Ci mod m) mod m. 
In this way a potential additional complexity is included provided by the counter 
system, but this construction makes the period properties depend on the specific 
function /. 

The aim of this paper is to extent parts of the analysis performed by Shamir 
and Tsaban [3] on the periodic properties of counter assisted stream ciphers. 
In particular, we aim at providing lower bounds for periods and diversities of 
generators where the complexity of the counter system is included as described 
above. As an example of such a construction we analyze the periodic properties of 
the recently proposed Rabbit stream cipher [5]. Strict bounds on the diversities 
and periods are determined and corresponding expectation values are calculated. 
The investigations are related to a “mod 3” characteristic of Rabbit. We analyze 
this characteristic in the last part of the paper. 

The rest of the paper is organized as follows. In section two we perform an 
analysis of counter assisted generators and provide strict lower bounds on the 
internal state periods and diversities. In sections three and four we analytically 
and statistically, respectively, analyze the internal state period of Rabbit. We 
then analyze a ’’mod 3” characteristic of Rabbit in section five. We conclude and 
summarize in section six. 



2 Period and Diversity Properties 

In this section we provide basic definitions and strict lower bounds on the periods 
and diversities for simple systems. 



2.1 Basic Definitions 

We will need the following definitions. 

Definition 1: Define the diversity, Dg, of a sequence, Si, with period length Ng 
to be the number of distinct elements in the sequence. Note that Dg ^ Ng. 

Definition 2\ Define the degeneracy, df(u), of an output point, u = /(s), to be 
the number of distinct input points, s, resulting in the same output point u. 
Furthermore, define the degeneracy, dr, of the map u = f(s) to be the maximal 
df{u) ofalluGlm(/). 

Next, we provide diversity and period relations for simple systems to be used 
for more complicated constructions. 

Consider a function u = f{s) mod m, as illustrated in the left part of Fig. 1, 
where s € {0, ... ,m — 1}, u € Im(/) C {0, ... , m — 1}. Then we have 

Nu ^ Ng, 



( 1 ) 
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Fig. 1. The left figure illustrates the system, u = /(s) mod m, and the right figure 
illustrates the system, u = s -\-t mod m. 



and 




(2) 



Note, that if / is bijective, i.e. c?f = 1, we have equalities in the above relations. 

Also, consider the system defined hy u = s + t mod m, where s G {0, ... ,m — 
1}, t G {0, . . . , m — 1} and tt G {0, . . . , to — 1}, as illustrated in the right part of 
Fig. 1. The following properties for this system hold true: 



N 

" = gcd(fVs,A^t)’ 



( 3 ) 



and 



> 



Ds/Dt 

Dt/D, 



if D, ^ Dt 
if A ^ A. 



( 4 ) 



Definition 3: In the present context a counter assisted vector valued next-state 
function is defined in the following way 

Xi+i = F{yi] mod to, (5) 



where 



yl = (xj -I- Ci) mod to, (6) 

such that Xi is the internal state variable. The counter state, q, is generated by 
any of F independent generator and Ci yf Cj for i — j mod Ac yf 0 where Ac is 
the (known) counter period, iji is the counter modified internal state and to is 
the size of the state space for each vector component. The system is illustrated 
in Fig. 2. 



2.2 Results for Counter Assisted Generators 

In the following we analyze the general construction given in Definition 3 above^. 

^ Note that eqs. (7) and (9) with proofs already appeared in Appendix C of [5], but 
they are included for completeness of the description. 
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Fig. 2. Illustrates the counter assisted system described in Definition 3. The vector, a, 
is the increment value for the counter system. Furthermore, e denotes the extraction 
function. 



According to Lemma 4.2 in [3], yi will have at least the period of the counter 
system, Nc- 

Proof: Assume that there exists yi = ifj for i — j mod yf 0, then 

-ffi+i = F{yi] + Ci+i and yj+i = F{yj] + cj+i. Moreover, we have: Ci+i yf Cj+i, 
therefore, t/i+i yf i/j+i- Finally, if jji-i = yj-i this would imply that yi yf yj 
which is a contradiction. Thus, also yi-i yf yj-i ■ 

The diversity, Dy, of the periodic sequence, yi, will at least be the square root 
of the diversity, D^, of the counter system or at least the period, Nc = D^, of 
the counter system, divided by the size of the image of F, |Im(F)|, all according 
to Theorem 4.3 in [3]. 

It is not guarantied that Xi will have at least the period of the counter system 
(see Definition 3 or Fig. 2), but lower bounds for the period. Ax, and diversity, 
I?x, of the periodic sequence, Xi, can still be obtained. 

First, we note that there are relations between the counter period, Ac, the 
internal state period. Ax, and the period of the y variables. Ay: 

Ay = oAx = 6Ac, (7) 

where a and b are integers greater than zero with gcd(a, 6) = 1. 

Proof: According to eq. (1), we have Ax ^ Ay. In particular. Ax divides Ay, 
because, if we assume that this is not the case, then there would exist an i such 
that F{yi] = Xi+i yf x Ny = F{y Ny ) which contradicts the Ay peri- 

odicity. Thus, there exists an integer, a > 0, such that Ay = oAx. We also have 
that Ac divides Ay because if this was not the case then cy yf c. Vy . We just 

showed that Xi = fi+ v for all i, but j/j = -b cj yf x Ny + c Ny = yi+Ny 

which again contradicts the Ay periodicity. Therefore, there exists an integer, 
6 > 0 such that Ay = 6Ac and consequently. Ay = oAx = 6Ac ■ 
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Let dY{xj) be the degeneracy of a point Xj in a periodic solution generated by 
the next-state function, F. Furthermore, let dmin be the smallest d-p{xj) of all 
Xj belonging to the periodic solution. We then obtain the following bound: 

(8) 

Proof: Since = bN^/a, we want to show that dmi„ ^ a. For a = 1 
this is trivially fulfilled. For a > 1 the periodicity gives: Xi = Xij^M^ = 
Xi+ 2 N^ = ... = Xi^(a-i)N^- On the other hand, the corresponding counter 
values are non-equal: Ci yf Ci+Tv^ Ci+ 2 N^ ^ Q+(a-i)Ar„, which is true 

since gcd(a, 6) = 1 and cy yf Cj for z — j mod W y^ 0. Therefore, it follows: 

Xi ~h yf Xij.]\j_^ C^-i-TVx XiJ^2Ny^ 4” ^i+2Ny^ 7^ ••• 7^ ^z-t-(a— 1 )A:^x 4” ^z-t-(a— 1) A^x 

equivale ntly: y i yf yi+N^ y^^ z-i- 2 JVx 7 ^ ■■■ 7 ^ yi+{a-i) N^ - Because of the periodicity 

we have F{y^ = F{iji+M^ = F\fk+ 2 i^ = ... = F(j/i+(a_i)iVxi Therefore, the 
existence of such a periodic solution requires that the degeneracy of each point 
in the period must be at least a. Thus, we have dmin = a and eq. (8) follows ■ 



Clearly, it is difficult to use the bound given eq. (8) in practice as b and dmin 
depend on the specific periodic solution. However, a more general bound follows 
trivially by noting that 6^1 and that dp ^ dmin where dp is the degeneracy of 
the next-state function: 



ap 



(9) 



A lower bound for the diversity of the Xi sequence can also be specified. 
According to eq. (2), is bounded by ^ Dy/dp- For the present purpose, 
a part of Theorem 4.3 in [3] is useful: Dy ^ Nc/D^ which follows from eq. (4) 
and that Nc = D^. Therefore, we can write 



N 

-^^DySD^- dp, 



to obtain the following bound: 



(10) 



D. > 




( 11 ) 



The above bounds given in eqs. (9) and (11) can be improved further. Clearly, 
using only the degeneracy of the next-state function might result in a rather 
pessimistic lower bound on both the period length, N^, and the diversity, Hxi if 
not most of the points in Im(A) have dp pre-images. In general, there will be a 
distribution of degeneracies, i.e. each output point will have a certain degeneracy, 
d{xj). We now define a sorted list, {dp(l), dp(2), ..., dp(|Im(F)| — 1), dp(|Im(.F)|)} 
of degeneracies of points in Im(.F) such that dp(j) ^ dp(z) if j < i. Thus, the 
bound dp • I?x = Dy generalizes to 



£>x 

^dp(j) ^ Dy, 



i=i 



( 12 ) 
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Consequently and are bounded by 



'^dpU) 






> 



Nc 



and 



> 



iVc 

dviOx) ’ 



where eqs. (8), (10) and (12) were used. 



(13) 



(14) 



3 Strict Bound on the Internal State Period of Rabbit 

In order to illustrate the use of the above results, we analyze Rabbit [5] in the 
following. 

The core of the Rabbit algorithm is the iteration of the next-state function 
defined by 






gj^i + {gj-i mod 8,i 16) -I- {gj -2 mod 8.i 16) for j even 

9j,i T mod 8,i 8) “t“ 9j — ‘2 mod 8,z j odd 



(15) 



9j,i = ^ 32)) mod 2^^, (16) 

where j G {0, ...,7} and all additions are modulo 2^^. For convenience, we write 
the next-state function in the following way 

x^+l = R{Vi] = F{g{y,]) mod 2^^, (17) 



where 



Vi = {Xi + Ci) mod 2^^, (18) 

such that Xi is the internal state variable, c) is the counter state, g is the vector 
of ^-functions and F is the combining function containing the rotations and 
additions of the eight ^-functions. The system is illustrated in Fig. 3. 

According to eqs. (9) and (11) we need to know the counter period, N^, as well 
as the maximal degeneracy, du, of the next-state function in order to obtain a 
lower bound on the period length, iVx, and the diversity, Hx of the internal state. 
If we use eq. (13) together with eq. (14) we also need to know the degeneracy 
distribution of the next-state function. The counter period is shown in [5] to be 
Nc = 2^®® — 1. In the following we calculate bounds on the degeneracy as well 
as the degeneracy distribution of the next-state function of Rabbit. 




Periodic Properties of Counter Assisted Stream Ciphers 



45 




Fig. 3. Illustrates the Rabbit algorithm. 



3.1 A Bound on the Degeneracy 

The degeneracy, is bounded by 

dpt ^ dp ■ dg = dp ■ dg®, (19) 

where dg is the degeneracy of the vector of ^-functions, dg is the degeneracy for 
each component of the g-function and dp is the degeneracy of the combining 
function, F. 

The degeneracy for the (^-function, dg, can easily be obtained by running 
through all its 2^^ possible inputs. It turns out that there is one output with 
18 inputs and all other images have smaller degeneracies, i.e. dg = 18. The 
exact distribution is shown in Fig. 4. However, the degeneracy for the combining 
function, F, cannot be obtained exactly by a measurement, but in the following 
we provide an upper bound. 

Consider the three equation systems given in eqs. (20), (21) and (22) below 

X = F{y) mod 2^^, (20) 

arising by replacing all the ^-functions by identity functions but keeping the 
rotations and y € {0, . . . , 2^^ — 1}®, 

w = F{^, (21) 

same as eq. (20) but no addition modulus is performed, and finally 

F= F(^ mod 2^2 - 1, (22) 

where y € {0, . . . , 2 ^^ _ 2}®. Furthermore the rotation operation entering in the 
F'-function can be written as 



2^y mod (2®^ — 1) if y < 2 ^^ _ \ 
232-1 if y = 232 _ 



(23) 



The system given in eq. (20) is non-linear. To circumvent this difficulty we 
first analyze the system defined in eq. (22) which is linear and can be written as 

z = Hy mod (232 - 1). (-24) 
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Fig. 4. The degeneracy distribution for the ^-function, where n(d) denotes the number 
of points having degeneracy d. 

The matrix B is then given by 

1 0 0 0 0 0 2^6 2^6 

2» 1 0 0 0 0 0 1 

2^6 216 1 0 0 0 0 0 

0 1 2® 1 0 0 0 0 

0 0 2i6 216 1 0 0 0 

0 0 0 1 2® 1 0 0 

0 0 0 0 2i6 216 1 0 

0 0 0 0 0 1 2® 1 

B is not invertible modulo 2^^ — 1, since 3 divides both the determinant and the 
modulus. B is therefore invertible modulo (2^^ — l)/3 but not invertible modulo 
3. Furthermore, it can be verified that B modulo 3 is 3-to-l. Consequently, 
according to the Chinese Remainder Theorem (CRT), B is also 3-to-l modulo 
232 _ 2^^ kernel of the map consists of the three vectors: /3o = (0,...,0), 

/3i = ((232 _ (232 _ ^ (2(232 _ i)/3^ 2(232 _ 

Next we consider the case when the addition modulus is omitted, i.e. the 
system defined in eq. (21). Clearly, the images can be at most 3-to-l when 
restricting the input vectors to y G {0, . . . , 232 _ 2|8^ Including the cases where 
one or more of the components of the input vector are 232 _ (;joes not change 
the 3-to-l property of eq. (21): 

Proof: The particular case where yj = 232 _ x for a j G {0, ... ,7} and yi < 232 _ 
for i yf j corresponds to just adding F{y), where yj = 232 _ = Q for 
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i ^ j, to a Wj resulting from an ijj = 0 and iji = yt for j ^ i. Of course, we could 
have that F{y) = F{y + Pi mod 2^^ — 1) = F{y + P 2 mod 2^^ — 1) but then 

none of those can be equal to F{y). As eq. (21) is 1-to-l if y G {0, 2^^ — 1}®, we 
conclude that eq. (21) is at most 3-to-l for all y G {0, . . . , 2®^ — 1}® ■ 



Furthermore, for every output vector of the systems eqs. (21) and (22) the 
sum of its elements is zero modulo 3, i.e. 

^ mod 3 = 3 = 0. (26) 

This follows because the sums of the column elements of the matrix, B, are 
divisible by three, i.e. 1 + 2® + 2^® mod 3 = 0 and 1 + 2^® + 1 mod 3 = 0. 

Finally, we investigate the system defined in eq. (20). In order for two vectors, 
wi and W 2 , to be equal modulo 2®^, i.e. xi = X 2 , we must have 

wi=w^ + (27) 

where k G {—2, —1, 0, 1, 2}®. Since 0 ^ Wj ^ 3(2®^ — 1) then for each vector w 
there are 3® relevant vectors k. Consequently, since the map is at most 3-to-l 
without the addition modulus, an upper bound of how much the total system, 
eq. (20), is to one, is then given by 3- 3®-to-l. This can be lowered using eq. (26) 
since we must also have 

^^2®^%j mod 3 = 0. (28) 




This limits the numbers of fc-vectors, such that dp = 3 • 3®/3. Consequently, the 
upper bound for dp is then 

dR ^ 3® • 18® « 2^®, (29) 

where eq. (19) was used. Eqs. (9) and (11) then provide the lower bound on the 
internal state period: 

o256 1 

^ « 22^0, (30) 

and the diversity: 

^ « 2i°®. (31) 



3.2 An Improved Bound on 

The above bounds given in eqs. (30) and (31) do not take eqs. (13) (or equiva- 
lently eq. (12)) and (14) into account. The degeneracy distribution of the next- 
state function is not known, but we can obtain the degeneracy distribution of 
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Fig. 5. The left figure shows the degeneracy distribution of the g- function, and the right 
figure shows the accumulated distribution together with the horizontal line illustrating 
the lower bound, Dy ^ 2^^®. In both figures n{d) denotes the number of points having 
degeneracy d. 



the function. To accomplish that, we use the degeneracy distribution of the 
g-function and combine its degeneracies into the eight component vector in all 
possible ways. The result is shown in left part of Fig. 5. Using eq. (12) and that 
Dy ^ 2^^®, we obtain dg{Dx) S 2^®, i.e. in any periodic solution of the function 
at least one point will exist with degeneracy less than or equal to 2^®. The result 
is illustrated by the right part of Fig. 5. Thus, using eq. (14) the period is 



N 

— 38 . 228 



-)215 



and the diversity is > 2^°^. 



(32) 



4 Statistical Analysis of Periods and Diversities 

Using the above analysis, we sample the degeneracy distribution of the system in 
eq. (20). Furthermore, this measurement can be used to argue that the internal 
state period will be larger than or equal to the counter period with a probability 
practically equal to one. 



4.1 Measuring the Degeneracy Distribution of the i^-Function 

The key observation is that for a given input vector, y, we know all other input 
vectors which might result in the same output vector, x = w mod 2®^. In order 
for w\ and W 2 to be equal modulo 2®^ we must have 

w\= W 2 + 2^^fc = W 2 + k + (2®^ — l)fc, (33) 

where k G {— 2, — 1, 0, 1, 2}®. Because of the linearity of the system given in 
eq. (22), we only need to find all pre-images of the vectors, k, and add each 
one of them to the input vector, y. More precisely, for all k calculate all j/ = 
y + B~^k mod 2®^ — 1, then calculate all x' = F{if \ mod 2®^ and look for 
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matches^. However, by this procedure we only obtain the specific matches where 
all vector components of ]/ are different from 2^^ — 1. Therefore, we must also 
check the possibilities when one or more vector components are 2^^ — 1. This 
is done as follows: Whenever a vector component of y' is zero, we also try the 
vector where the zero component is replaced by 2^^ — 1. 

In order to find all pre-images of the k vectors we do as follows. As described 
in section 3.1, the matrix B is 3-to-l. However, we can for a given output vector 
k find each one of the three pre-images by using the CRT : 

B — - — Ki + 3bB ^f/cmod — - — jj mod (2^^ — 1), (34) 

where a and b are determined by: a(2^^ — 1)/3 mod 3=1 and 36 mod (2^^ — 1) /3 = 
1, respectively. The matrix B~^ denotes the inverse of B modulo (2^^ — l)/3. 
Finally, k denotes one of the three pre-images of k modulo 3, which can easily be 
found by searching through the possible 3® input vectors to the matrix B mod 3. 
Now it is straightforward to sample the degeneracy distribution. It was done as 
follows: 

1. Pick a random input vector y G {0, . . . , 2®^ — 2}®. 

2. Calculate w = F{y) where the addition modulus is not performed. 

3. Check each component, Wj, and select vectors k where kj is in the corre- 
sponding interval, i.e.: 

— if Wj G {0, . . . , 2®^ — 2} select kj G {0, 1, 2}. 

— if Wj G {2®^ — 1, . . . , 2 • 2®^ — 3} select kj G {—1, 0, 1}. 

— if Wj G {2 • 2®2 - 2, . . . , 3 • 2®2 - 6} select kj G {-2, -1, 0}. 

4. This provides the candidates each of the form if = y + B~^k mod 2®^ — 1 for 
pre-images which can result in the same image as y under addition modulo 
2®^. Furthermore, if y' = 0 then also try y' = 2®^ — 1. 

5. Repeat 10® times. 

Fig. 6 shows the normalized result. It is normalized by dividing each count by 
the corresponding degeneracy. 

4.2 Expectation Values of the Period and Diversity of the 
g- Function 

It seems reasonable to assume that the internal state variables, Xi, and the 
counter variables, c), are independent. Therefore, the expectation value of the 
diversity, Dy, of their sum should be much higher than the bound given above: 

{Dy) « (1 - e"^) • 2^®® « 22®®-®4, (35) 

where the pre-factor, 1 — e~^, originates from the fact that if n = 2^®® — 1 
(minimal Ny) balls are thrown into m = 2^®® urns, a proportion of about 

® Note, that we symbolically write even though B is not invertible modulo 2®^ — 1. 
In other words B~^k is just a shorthand notation for the resulting pre-images. 
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Fig. 6. The normalized measured degeneracy distribution for the combining function, 
F, where n{d) denotes the number of points having degeneracy, d. 



of the urns will remain empty (see e.g. [6] for more details). According to eq. 
(8) then if a point with degeneracy one belongs to a periodic solution the period 
will be at least that of the counter period. The probability that no such vector 
belongs to the periodic solution can be calculated as follows. In the g- function 
2721872779 input values are many-to-1 and 1573094517 input values are 1-to-l. 
Consequently, 1573094517® « input vectors to the g-function are 1-to-l. 

The probability for an input vector to be many-to-1 is thus given by 



1573094517® 

2256 



0.999676, 



(36) 



but the probability that different input vectors all to be many-to-1 is 

0.999676^ which is by any measure zero. Consequently, it is reasonable to 
assume that the period of the g function, i.e. without the combining function F, 
is Ny. The diversity, Dg, can be calculated by the same method used in eq. (35) 
above: Let n{d) be number of points having degeneracy d in the g-function, then 
there are |Im(g)| = ^j^n{d) « 2^®° ®® possible g images. Therefore, the expected 
diversity is 

{Dg) « (^1 - ■ 2250-68 _ 2250-68^ (- 37 ^ 



4.3 Expectation Values of the Internal State Period and Diversity 

In order to calculate the expectation values of the internal state period, N^, and 
diversity, Z?x, we repeat the same analysis as in section 4.2. The probability for 
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an input vector to be many-to-1 is 

^ 37820209 



108 



0.621798, 



(38) 



i.e number of samples minus the number of counted 1-to-l points divided by 
number of samples (see Fig. 6). Consequently, the probability that all 2 ^ 80.68 
input vectors are many-to-1 is 0.621798^ which is again neglectible. Thus, 
according to eq. (8) we claim that the internal state period length is at least Nc. 
The expectation value of the diversity is obtained as in eq. (37): Let n{d) be the 
measured number of points having degeneracy d in the F’-function, then there 
are |Im(F)| « 2^8®/108 • ~ 2^88-36 possible F images. Thus, 



(D.) « ( 



1 — e 



2250.68 
■ 2 ^ 55.86 



-> 255.36 



-> 250.65 



(39) 



We conclude that the expectation value for the period of the counter assisted 
next-state function of Rabbit is at least that of the counter system and, moreover, 
a very large internal state diversity is expected. Finally, note that even if the 
strict bound for the diversities Dy ^ 2 ^^® ^nd Dg ^ ^ = 2^®^ was 

used in the above calculations, this conclusion would remain the same. 



5 Analysis of the Mod 3 Characteristic 

As explained in section 3, eq. (26) there is a ’’mod 3” characteristic of the next- 
state function of Rabbit (see [7] for a general discussion on “mod n” attacks and 
see [8] for a “mod n” analysis of Rabbit). If the results of the additions are not 
reduced by modulus 2®^, then the sum of the eight output vector components is 
always zero modulo three. Taking the addition modulo 2®^, we still measure a 
distribution of the sum modulo three that is not uniform: 

{ 0 with probability 0.3337 

1 with probability 0.3337 (40) 

2 with probability 0.3326. 

This bias can to a high degree be reproduced by (tedious) analytical calculations 
(see [8] and [9] for details). If this bias was detectable in the extracted output, 
it would allow an attacker to mount a distinguishing attack. Since the property 




Table 1. Numerical entropies. The figures for the joint distribution were obtained by 
computing the full distribution. The figures are therefore exact, except for rounding 





77(A) 


77(73) 


77(A,73) 


3 

4 


1.5850 

1.5850 


12 

16 


13.5805 

17.5850 



errors. 




52 



O. Scavenius et al. 



of i^Xj) mod 3 does not depend in any way on the value of the key, it is not 
possible to recover the key from it. 

Since the extracted output consists of the XOR values of the upper and lower 
halves of two Xj-values, there is no way we can reconstruct the sum of the vector 
components from the output (see [5] for details about the extraction function). 
Consequently, a necessary condition for detecting a bias in the output is that 
the following two distributions are dependent: 

7 

A{x) = + Xj^\) mod 2^^ mod 3, (41) 

B{x) = (xo,h © 2 ^ 3 , 1 , xi,h © 2:4.1, • ■ • , 2:7.11 © 2:2.1), (42) 

where the subscripts h and 1 denote the upper and lower 16 bits of Xj, respec- 
tively. We have done several measurements in order to detect any dependence 
of the two distributions, but have not found any. For instance, we computed 
the entropy, H, of the joint distribution (A,B) on down-scaled versions using 
three respectively four 8-bit registers instead of eight 32-bit registers. We used 
uniform distributions for the Xj. The results are presented in Table 1. It can 
be seen that the entropy of the joint distribution converges quickly to the sum 
of the entropies of the distributions of A and B. We conclude that when eight 
registers are used, the distributions will be almost completely independent. A 
fortiori, a cryptanalyst won’t be able to notice the “mod 3” bias of the internal 
state in the output. 

6 Conclusions 

In this paper we analyzed the periodic properties of counter assisted generators 
where the counter system also provides additional complexity. As an example, we 
extended the lower bound on the internal state period of the recently proposed 
stream cipher Rabbit from 2^®® to 2^^®. By assuming that the internal state and 
the counter state are independent and applying the general analysis, we found 
that the internal state period of Rabbit is at least the counter period, i.e. 2^®® — 1. 
Furthermore, we discussed the ’’mod 3” characteristic of the next-state function. 
Attacks based on this characteristic were discussed and found infeasible. 
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Abstract. In this paper, an improved fast correlation attack on 
stream ciphers is presented. The proposed technique is based on the 
construction of an unequal error protecting LDPC code from the LFSR 
output sequence. The unequal error protection allows to achieve lower 
bit-error probability for initial bits of the LFSR in compared to the 
rest of the output bits. We show that constructing the unequal error 
protecting code has also the advantage of reducing the number of output 
bits involved in decoding to less than the available keystream output 
bits. Our decoding approach is based on combination of exhaustive 
search over a subset of information bits and a soft-decision iterative 
message passing decoding algorithm. We compare the performance 
of the proposed algorithm with the recent fast correlation attacks. 
Our results show that we can reduce the number of bits obtained by 
exhaustive search in half and still get better performance comparing to 
recent fast correlation attacks based on iterative decoding algorithm. 
Using the expected number of parity-check equations of certain weights, 
we find the lower bound on the number of information bits that needs 
to be obtained by the exhaustive search without compromising the 
performance. 

Keywords. Stream ciphers, fast correlation attacks, linear feedback shift 
registers, cryptanalysis, LDPC codes. 



1 Introduction 

One of the most remarkable of all ciphers is the one-time-pad where the plaintext 
message is added bit by bit (or in general, character by character) to a random 
sequence of the same length. The remarkable fact about the one-time-pad is 
its perfect security. Assuming a ciphertext only attack, Shannon proved that 
even with infinite computing resources, the cryptanalyst could never separate 
the true plaintext from all other meaningful plaintexts. The disadvantage is the 
unlimited amount of key [1]. The appealing feature of the one-time-pad suggested 
building synchronous stream ciphers which encipher the plaintext by use of a 
pseudo-random sequence. This removes the requirement of an unlimited key. The 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 54-66, 2004. 
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pseudo-random sequence is under control of a secret key that is generated by a 
deterministic algorithm called the keystream generator. To ensure the security, 
it must not be possible to predict any portion of the keystream better than just 
random guessing, regardless of the number of keystream bits already observed. 

Linear feedback shift registers (LFSR) are the basic components of most key- 
stream generators. As shown in Figure 1, one method of generating the keystream 
is to combine a fixed number of LFSRs’ outputs by means of a nonlinear function 
/. To resist cryptographic attacks using the Berlekamp-Massey algorithm, the 
function / is chosen so that a sequence with high linear complexity is obtained. 
The secret key is the initial state of each LFSR. The characteristic polynomial of 
each LFSR of length ki is assumed to be known by the cryptanalyst. The total 
key bits required to specify the initial state of the stream cipher generator is 
where R is the number of the LFSRs. In a brute force attack, Hili 2^* 
possible states of the LFSRs are examined, which is not feasible in practical 
systems. 




Fig. 1. Keystream generator. 



In [2], Siegenthaler showed that if there exists a correlation between the 
keystream sequence and the outputs of LFSRs, it is possible to determine the 
initial state of each LFSR independently thereby reducing the cryptanalytic 
attack to a divide-and-conquer attack with approximate complexity of 
Siegenthaler’s attack amounts to an exhaustive search through the state space 
of each individual LFSR. 

Later, it was shown by Meier and Staffelbach that if the number of taps t of 
the characteristic polynomial is small, it is possible to determine the initial state 
of the LFSR by means of an iterative algorithm that has a complexity much 
less than the exhaustive search [3]. This work was followed by several papers, 
providing minor improvements to the initial results [4, 5, 6, 7]. 

In [8], Johansson and Jonsson proposed a novel algorithm based on identify- 
ing an embedded low-rate convolutional code from the LFSR output sequence. 
This embedded low-rate convolutional code can then be decoded with low com- 
plexity using the Viterbi algorithm. Their approach considers a decoding algo- 
rithm that requires memory. This algorithm provides a remarkable improvement 
over previous methods. In [9], a new method for the fast correlation attack was 
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proposed based on constructing and decoding turbo codes. For a fixed memory 
size, this technique provides better performance than the method in [8]. The price 
for the improved performance is an increased computational complexity. One of 
the advantages of the attacks based on the convolutional and turbo codes is that 
they do not require a low weight feedback polynomial. In [10], a new simple algo- 
rithm for fast correlation attacks on stream ciphers was presented. Although this 
algorithm is influenced by [8,9], it has the advantage that it reduces the memory 
requirement significantly. A detailed comparative study of the algorithms are 
covered in [11]. 

Canteaut and Trabbia showed in [12] that the Gallager iterative decoding al- 
gorithm using parity-check equations of weights four or five is more efficient than 
the attacks proposed in [8,9]. The large memory usage makes the complexity of 
the Viterbi decoding high. One advantage of the attack based on convolutional 
codes is the lower complexity of the preprocessing step, but this part of the attack 
is performed once while the decoding step is repeated for each new initializa- 
tion of the system. An algorithm based on decoding of the binary block code 
whose parity-checks are of low weights was proposed in [13]. The authors em- 
ployed combination of restricted exhaustive search over a set of hypotheses and 
a one-step or an iterative decoding technique. The exhaustive search is employed 
to provide a possibility for the construction of suitable parity-check equations 
needed for a high performance decoding. In [15], Chose and et al. presented some 
major algorithmic improvements. This improvement is achieved at the cost of 
some loss of parallelism over the method presented in [12]. They focused on the 
search for efficient algorithms to find and evaluate parity-check equations. The 
main idea is combined with the partial exhaustive search to yield an efficient 
cryptanalysis. 

In this paper, we propose a new technique for fast correlation attack that is 
based on constructing a low-density parity-check codes (LDPC) from the LFSR 
output sequence. The key idea in our approach is to construct an LDPC code 
that provides a lower decoding error rate for the information bits (the initial 
state of the LFSR) compared to the rest of the LFSR output bits. We show 
that the structure of the underlying LDPC code constructed from the LFSR 
output sequence has a crucial role in the decoding performance. Motivated by 
the approach of [13,15], we also develop a novel algorithm for the construction 
of parity-check equations. Our decoding approach is based on combination of 
exhaustive search over a set of information bits and a soft-decision iterative 
message passing decoding algorithm [16]. 

The paper is organized as follows. Section 2 reviews the decoding model 
for the fast correlation attack. Section 3 points out the underlying idea for 
the proposed fast correlation attack, the procedure for the construction of the 
parity-check equations, and the decoding steps of the iterative message passing 
algorithm. The expected cardinalities of the parity-check equations are given in 
Section 4. Using the expected number of parity-check equations of certain weight, 
a lower bound on B is derived in Section 5. Section 6 presents the performance 
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of the proposed algorithm. Finally, a comparison of the proposed algorithm with 
the recent fast correlation attacks is provided in Section 7. 

2 Decoding Model for the Fast Correlation 

Most authors view the problem of finding the initial state of the LFSR as a 
decoding problem. Assume that the target LFSR has length L and let 7 denote 
the set containing all the distinct LFSR sequences. Clearly, 7 = 2 ^ and for a 
fixed length N , the truncated sequences from 7 form a linear {N, L) block code. 
Therefore, the observed keystream sequence z, = z\, Z 2 , ■ ■ ■ , zn may be regarded 
as the received channel output and the LFSR sequence x = xi,X2, - ■ ■ ,xn is 
regarded as a codeword from an {N, L) linear block code. Due to the correlation 
between Xi and Zi, we can view each Zj as the output of the binary symmetric 
channel (BSC) when Xi is transmitted. The correlation probability defined by 

Pr{xi = Zi) = l-p=l/2 + e 

determines the crossover probability p of the BSC. This is shown in Figure 2. 



LFSR BSC 




Fig. 2 . Model of the correlation attack. 



Therefore, the cryptanalyst’s problem is to restore the LFSR’s initial state 
{xi,X2, • ■ . , xl) from the observed output sequence z = (zi, Z2, . . . , zn), given the 
feedback polynomial of the LFSR of degree L. Let H{p) be the binary entropy 
function H{p) = — plog 2 P — (1 — p) log 2 (l — p)- Then, for a code rate R = L/N 
that is less than the capacity C{p) = 1 — H{p) of the BSC, by Shannon’s theory, 
there exists a code for which the decoding error probability approaches zeros as 
N goes to infinity. 



3 Underlying Idea for the Fast Correlation Attack 

The proposed algorithm construct a low-density parity-check codes (LDPC) from 
the LFSR output sequence that is defined by a sparse parity-check matrix. The 
resulting {N, L) LDPC code is an irregular LDPC code. An LDPC code is ir- 
regular if the number of I’s per column of H or the number of I’s per row of 
H is allowed to vary. An LDPC code can be represented by a bipartite graph 
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consisting of check nodes and variable (bit) nodes. The N — L rows of H specify 
the N — L check node connections, and the N columns of H specify the N bit 
node connections. The bit node is connected to the check node if and 
only if the element of H is one. It is preferable to have a bit node with a 
high degree since it will receive more information from the check nodes, allowing 
for more accurate judgement of the correct bit value. On the other hand, it is 
more preferable to have a check node with a low degree since in this case the 
information that it sends to variable nodes is more valuable. Since they are con- 
nected in a bipartite graph, if the degree of the bit nodes is high (low), then so 
must the degree of the check nodes be high (low). But it is possible to put more 
weight of the parity-check equations on the information bits, raising the degrees 
of some variable nodes while keeping the degrees of check nodes constant. 

In the fast correlation attack, we want to recover the initial state of the LFSR 
completely and we are not interested in the values of the rest of the output 
sequence. This motivated us to construct the underlying LDPC code such that 
we get lower bit error probability for the initial state of the LFSR than the rest 
of the output sequence. This can be achieved by increasing the degrees of the 
variable nodes that correspond to the initial state of the LFSR. Also raising the 
degrees of the output bits that are directly connected to the initial bits protects 
these bits more than the rest of the output sequence. Since they are directly 
connected to the initial bits, they help to lower the bit error probability of the 
initial bits. 

A high performance decoding of LDPC codes requires low weight parity- 
check equations. Therefore, we must employ low weight parity-check equations 
that involve as many information bits as possible. Then, we use the parity-check 
equations that raise the degree of the output bits directly connected to the initial 
state of the LFSR. Constructing the H matrix in this way, some of the output 
nodes would have degree zero. Therefore, they do not get involve in decoding 
phase. This has the additional advantage of reducing the decoding complexity. 
In other words, instead of decoding a code of length N, a shorter sequence is 
decoded. 

The bipartite structure of the proposed LDPC code is shown in Figure 3. 
Here, the bit nodes and the check nodes are represented by circles and squares, 
respectively. We have three sets of variable nodes: the initial bits of degree di, 
parity bits of degree dj and parity bits of degree dk- Note that we choose di > 
dj > dk for unequal protection. There are two sets of parity check equations. 
Parity checks ci of degree dci involve both information bits and the rest of the 
LFSR’s output bits. Parity checks C 2 of degree dc 2 do not involve the initial bits. 



3.1 Generating Parity Check Equations 

The preprocessing step of our attack consists of generating low-weight parity- 
check equations that provide better performance and stronger error protection 
for the LFSR initial bits. The performance of a set of parity-check equations 
depends on their cardinality as well as on their weight distribution. Therefore, 
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Fig. 3. Bipartite graph of the proposed LDPC code. 



we are looking for sufficiently low weight parity-check equations. If the feed- 
back polynomial of the LFSR is long and high weight, it is not possible to 
find sufficiently powerful parity-check equations when the length of the available 
keystream output sequence is short. Using the same idea as in [13,15], B bits of 
the initial state are found through exhaustive search and L — B bits remain to 
be obtained using parity-check equations. This relaxes the constraint on parity- 
check equations. We require that the parity-check equations have low weight 
when the first B initial bits have arbitrary values. 

We choose to construct the parity-check matrix H with parity equations of 
weights three and four. We first fill the matrix H with those parity equations 
that have the largest involvement of the information bits. Since the number of 
initial bits of the LFSR is much smaller than the output bits, this makes the 
degrees of the initial bits larger than the rest of the output sequence. Let the 
length of the available output sequence be N, and the degree of the feedback 
polynomial f{x) be L. First, we generate parity-check equations that involve 
exactly one output bit and two or three information bits. These parity-check 
equations can be obtained as follows: 

- Compute all the residues qi{x) = x'‘ mod f{x) for L < i < and store them 
in a table T. 

- Search the table for parity-check equations that have weight four or less 
when the first B information bits take arbitrary values. 

We use all the parity-check equations that are constructed by the above criteria. 
Then we employ parity-check equation that involve exactly two output bits and 
one or two information bits. These parities can be constructed as follows: 

- Take all the choices of two out oi N — L entries of table T and generate a 
new table R by xoring those pairs. 

- Search the table R for parity-check equations that have weight four or less 
when the first B information bits have arbitrary values. 

Finding parity-check equations that involve three or four output bits by tak- 
ing all the three or four choices from table T has a high complexity. We use 
the same idea as in [15] to find these parities with a much less complexity. To 
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find parities that involve three output bits when the B information bits have 
arbitrary values, we proceed as follows: 

- Find the indices of table T and R such that {L — B)/2 of the initial bits 
have a certain pattern. 

- Among those entries, take an element from table T and an element from 
table R such that they have the same values in the remaining {L — B)/2 of 
initial bits. 

- Repeat the above procedure for every possible pattern of the first (L — B ) /2 
of the initial bits of the LFSR. 



B i 

L-B 



2 ' ^ 

^ L-B 



R 





T 




i 










Fig. 4. Illustration of constructing parity-check equations of weight three. 



The algorithm is illustrated by Figure 4. To find parities that involve three 
output bits with one information bit, we follow the same procedure as finding 
parities of three output bits, but each time we do not put any restriction on a 
certain initial bit (one of the L — B bits). Among the parities found, we choose 
the ones that the selected bit has value one. 

Same method is applied to find parities that only involve four output bits. 
But instead of searching both tables T and R for a certain pattern of the first 
(L — B)/2 initial bits, only table R needs to be searched. Among entries whose 
first (L — B)/2 initial bits are the same, we look for any two elements that have 
the same value in the next {L — B)/2 bits. 
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3.2 Iterative Decoding Algorithm 

After generating the parity-check matrix H, we decode the keystream output 
sequence using the message passing algorithm discussed in [16] over the binary 
symmetric channel to recover the initial bits of the LFSR. 



4 Expected Cardinalities of the Parity-Check Sets 

Suppose we have N available output samples and B bits of the information bits 
are found through exhaustive search. Then by the following lemmas we give the 
expected cardinalities of the parity-check sets. 

Lemma 1. The expected number of parity- check equations that involve exactly 
one information bit and two output bits is 

{L - B){N - L){N - L - (1) 



Lemma 2. The expected number of parity-check equations that involve exactly 
two output bits and two information bits is 

(L - B){L - B- 1){N - L){N -L- 1)2^-^-"^ (2) 



Lemma 3. The expected number of parity-check equations that involve only 
three output bits is 



{N - Lf{N - L - 1)2^-^-^ 



( 3 ) 



Lemma 4. The expected number of parity-check equations that involve exactly 
three output bits and one information bit from the first half of the {L — B)/2 
initial bits is equal to 



{N - Lf{N -L - 1)2^-'^ 



( 4 ) 



Lemma 5. The expected number of parity-check equations that involve exactly 
three output bits and one information bit from the remaining half of the {L—B)/2 
initial bits is equal to 



{N - Bf{N - L-\)2^-^-^ (5) 

Our simulation results show the accuracy of the above approximation when 
N and L are 4000 and 40, respectively. 




62 



M. Noorkami and F. Fekri 



5 Lower Bound on B 



Since finding the information bits through the exhaustive search is computation- 
ally expensive, ideally, we would like to minimize the number of those bits. On 
the other hand, reducing B increases the initial state recovery error rate due 
to the higher degrees of the check nodes. Here, we derive a lower bound on the 
number of information bits that needs to be found by the exhaustive search when 
the matrix H consists of mostly parity-check equations of weight three and some 
parity-check equations of weight four that involves at least one information bit. 

Ideally, the matrix H should have N columns and N — L rows. From (3), (2) 
and (1), we choose B such that the total number of parity-check equations of 
weight three and some parity-check equations of weight four is greater than 
N — L. Therefore, the following inequality should hold: 






{N - Lf{N -L-l) + {L- B){N - L){N - L - 1) + 



- B){Lb - 1){N - L){N -L-1) 



> N - L 



( 6 ) 



we find that the lower bound for B is 17 when N and L are 4000 and 40, 
respectively. 

Note that it is possible to reduce B further. However, we will not be able to 
obtain sufficient parity-check equations of weight three and most of the parity- 
check equations will then be of weight four. The performance of this code is 
inferior to the case that most of the parity-check equations have weight three. 

If we reduce B below than 17, then the number of parity-check equations that 
relate two output bits to two information bits would be insufficient. Therefore, 
we should employ parity-check equations that involve three output bits and one 
information bit instead. 

We can reduce B to the lowest value for which we can still obtain a few 
parity-check equations for each bit of the LFSR initial state. We employ all the 
parity-check equations we find for each bit and we fill the rest of the matrix H 
with the parity-check equations that involve four output bits. Since the number 
of these parity-check equations is large, they do not put any restriction on B. 
For L = 40 and N = 4000, B can be reduced to 10. 



6 Simulation Results 

Here we present simulation results of our attack on an LFSR of length L = 40 
with the feedback polynomial f{x) = 1 + + x^^ + x^"^ + -I- 

-I- x®^ -I- X®® -I- x®”^ -I- X®® -I- X®® -I- X®® -I- X®® -I- x"^®. This is the same polynomial 
considered in all the previous fast correlation attacks. We show that when the 
majority of the parity-check equations have weight three, the minimum number 
of information bits that needs to be found by the exhaustive search is equal 
to 17. Table 1 presents the error rate of the LFSR initial state reconstruction 
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as a function of the correlation noise p when B = 17, 18 and the length of the 
available keystream output sequence is 4000. The error rate in Table 1 is obtained 
by averaging over a randomly selected set of 5000 sequences. This error rate is 
computed over the information bits only (the initial state of the LFSR). 



Table 1. Error rate for the full recovery of the LFSR initial state. 



p 


B = 17 


B = 18 


0.46 


0.000 


0.000 


0.47 


0.240 


0.004 


0.48 


0.990 


0.740 


0.49 


1.000 


1.000 



To compare the performance of our unequal protecting LDPC code to that of 
the equal protecting code, we also constructed an equal protecting LDPC code. 
Table 2 summarizes the results for B = 18. As expected, the unequal LDPC code 
performs better than the equal LDPC code. The improvement in the recovery 
of the initial state of the LFSR is achieved in expense of increasing the bit error 
rate of the output bits. 



Table 2. Comparison of an unequal protecting LDPC code with an equal protecting 
LDPC code for B = 18. 



P 


Unequal protecting LDPC code 


Equal protecting LDPC code 


0.45 


0.000 


0.000 


0.46 


0.000 


0.100 


0.47 


0.004 


1.000 


0.48 


0.740 


1.000 


0.49 


1.000 


1.000 



When we reduce R to 16, we do not find sufficient parity-check equations of 
weight three. We construct the matrix H as before and fill the rest of the matrix 
H with weight-four parity-check equations that involve three output bits and one 
information bit. Since the number of these weight four parity-check equations 
is large, we can choose those parities for which all of the three output bits are 
already involved in the decoding. Therefore, the length of the output sequence 
involved in decoding is reduced to 1660 bits. Thus, the complexity of decoding 
is decreased. The results of decoding this code is given in Table 3. As expected 
the performance is degraded with respect to the results of Table 1 . 

As we discussed in the previous section, for the LFSR of length 40 with 16 
feedback taps and an available output sequence of length 4000, B can be reduced 
to 10. The rest of the parity-check matrix is filled with parity-check equations 
that involve four output bits. The performance of this code is also given in Table 
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3. The results of the table suggests that B = IQ can be used if the noise level is 
less than 0.43. 



Table 3. Error rate for the full recovery of the LFSR initial state for B = 10 and 
B = 16. 



p 


B = 10 


B = 16 


0.43 


0.000 


0.000 


0.44 


0.300 


0.000 


0.45 


1.000 


0.003 


0.46 


1.000 


0.006 


0.47 


1.000 


0.998 


0.48 


1.000 


1.000 



7 Comparison with the Previous Correlation Attacks 



In this section, we compare our proposed attack with the previous fast correlation 
attacks when the LFSR is of length 40 and has 16 feedback taps. The comparison 
is presented in Table 4. Note that the noise limit of the proposed algorithm is the 
same as in [15] while the required output sample is only 4000 versus 80,000. The 
required output sample in [13] is very close to the proposed algorithm, but the 
noise limit is only 0.36. Furthermore, the value of B in [13] is more than twice 
as large as its value in our method. The results also suggest that by applying 
unequal error protection the length of the keystream output bits involved in 
decoding is smaller than the length of the available output bits. 



Table 4. Comparison of noise limit and required samples of proposed attack with 
previous approaches. 



Algorithm 


Noise limit 


Required samples 


Specification 


[12] 


0.44 


400,000 




[13] OSDA 


0.34 


40,000 


B = 22 


[13] IDA 


0.36 


4096 


B = 22 


[14] 


0.469 


400,000 




[15] 


0.469 


80,000 


B = 18 


Proposed algorithm 


0.46 


2558 


B = 17 


Proposed algorithm 


0.44 


1660 


B = 16 


Proposed algorithm 


0.43 


2735 


B = 10 
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8 Conclusion 

We proposed an improved fast correlation attack technique that is based on 
constructing an unequal error protecting LDPC code from the LFSR keystream 
output sequence. The unequal error protecting LDPC code is constructed such 
that the initial bits of the LFSR have higher degrees than the remainder of the 
output sequence. This provides a lower decoding error rate for the initial bits. 
The constructed code has the additional advantage of reducing the number of 
output bits involved in decoding to less than the available output sequence. 

We employed parity-check equations of weights three and four only. These 
parities are found with a low complexity algorithm that is motivated by the 
approach of [15]. Since lower weight parity-check equations provide better per- 
formance, we constructed the parity-check matrix such that the majority of 
the parity-check equations have weight three. Our decoding approach is based 
on combination of exhaustive search over a set of information bits and a soft- 
decision iterative message passing decoding algorithm. 

Our simulation results indicate that the proposed algorithm offers complete 
recovery of the LFSR initial state for correlation probability as high as 0.46 
when the length of the available keystream output sequence is 4000 and 17 bits 
of information bits are found by the exhaustive search. The proposed algorithm is 
compared with the recent fast correlation attacks. Our simulation results indicate 
that we can reduce the size of exhaustive search in half compared to the previous 
works. Using the expected number of parity-check equations of certain weights, 
we also derived a lower bound on the number of information bits that needs to 
be obtained by the exhaustive search without compromising the performance. 
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Abstract. We present and analyze an adaptive chosen ciphertext se- 
cure (IND-CCA) identity-based encryption scheme (IBE) based on the 
well studied Decisional Diflie-Hellman (DDH) assumption. The scheme 
is provably secure in the standard model assuming the adversary can 
corrupt up to a maximum of k users adaptively. This is contrary to the 
Boneh-Franklin scheme which holds in the random-oracle model. 

Keywords: identity-based encryption, standard model 



1 Introduction 

The idea of identity-based encryption scheme (IBE) was formulated by Shamir 
[18] in 1984. Shamir’s original motivation was to simplify certificate manage- 
ment in email systems. Some additional applications of IBE schemes include key 
escrow/recovery, revocation of public keys and delegation of decryption keys [2, 

3]. 

An IBE scheme is an asymmetric system wherein the public key is effectively 
replaced by a user’s publicly available identity information or any arbitrary string 
which derived from the user’s identity. It enables any pair of users to commu- 
nicate securely without exchanging public or private keys and without keeping 
any key directories. The service of a third party which we called Private Key 
Generator (PKG) is needed whose sole purpose is to generate private key for the 
user. The private key is computed using the PKG’s master-key and the identity 
of the user. Key escrow is inherent in an IBE scheme since the PKG knows the 
private keys of all the users. 

Since the presentation of the idea in 1984, several IBE schemes have emerged 
in the literature, based on various hard problems, for example [8,20,21,15,12,5, 
19]. Unfortunately, most of the proposed schemes are impractical. 

Recently, a practical and functional IBE scheme was proposed by Boneh and 
Franklin [2,3]. Their scheme is adaptive chosen ciphertext secure (IND-GGA) in 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 67-80, 2004. 
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the random oracle model based on the Bilinear Diffie-Hellman (BDH) assump- 
tion, a natural analogue of the computational Diffie-Hellman problem. Specif- 
ically, the system is based on bilinear maps between groups realized through 
the Weil pairing or Tate pairing. The computational cost of the pairing is high 
compared to the computation of the power operation over finite fields. 

However this scheme is still dissatisfactory due to two main issues: (1) its 
security was not proved under the standard model; (2) there is no evidence that 
BDH problem is indeed hard. Indeed, a security proof in the random oracle 
model is only a heuristic proof. These types of proofs have some limitations. In 
particular, they do not rule out the possibility of breaking the scheme without 
breaking the underlying intractablity assumption. There exist digital signature 
schemes and public key encryption schemes which are secure in the random oracle 
model, but for which any implementation yields insecure schemes, as shown by 
Canetti et al. [4]. It is mentioned in [22] that BDH is reducible to most of the 
older believed-to-be-hard discrete logarithm problems and Diffie-Hellman (DH) 
problems, but there is no known reduction from any of those problems to BDH. 
As a result, we have no evidence that BDH problem is indeed hard. 

In this paper, we somewhat manage to answer part of the open problem 
posed by Boneh and Franklin [2,3], that is the possibility of building a chosen 
ciphertext secure IBE scheme under the standard computation model (rather 
than the random oracle model). We present and analyze an IND-CCA secure 
IBE scheme based on the DDH assumption. Our scheme is fc-resilient, which 
means that the malicious adversary can corrupt up to a maximum of k users 
adaptively and thus possesses the k corresponding private keys; however she 
cannot obtain any information pertinent to ciphertexts that are encrypted with 
public identities not belong to the corrupt users. 

We adopt the techniques of Cramer-Shoup [6,7] in our construction. More 
precisely, we use a polynomial-based approach as in [13,14,9], but their ultimate 
goal is different from us in that their concern is more on traitor tracing and re- 
vocation. For completeness, we also provide an IND-CPA secure IBE scheme for 
the non-adaptive setting and the adaptive setting respectively. The non-adaptive 
IND-CPA scheme is adapted from the El-Gamal scheme [11] and we incorporate 
the Pedersen commitments [16] in order to handle adaptive adversaries. 

For the security proof, we adopt a simple variant of the chosen ciphertext 
security definition for IBE system in [2], which is slightly stronger than the 
standard definition for chosen ciphertext security [17]. First, the adversary is 
allowed to obtain from the PKG the private keys for at most k public identities 
of her choice adaptively. This models an adversary who obtains at most k private 
keys corresponding to some public identities of her choice and tries to attack some 
other public identity ID of her choice. Second, the adversary is challenged on an 
arbitrary public identity ID of her choice rather than a random public key. 

A comparison of Boneh-Franklin (BF) scheme and our proposed scheme is 
given in the following table. 
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Model 


Assumption 


# of malicious users 


BF scheme 


Random Oracle 


BDH assumption 


No limit 


Proposed scheme 


Standard 


DDH assumption 


At most k 



The above table shows a trade-off between (model, assumption) and the 
number of malicious users. BF scheme requires a stronger (model, assumption), 
but there is no limit on the number of malicious users. Our scheme requires a 
weaker (model, assumption), but the number of malicious users is limited to at 
most k. This limitation arises at the sacrifice of the use of random oracles and 
thus it seems to be unavoidable. 

We argue, however, that the limit on the number of malicious users is not 
a serious problem in the real world. Indeed, it is not easy to corrupt a large 
number of users normally, meaning that the size of a malicious coalition cannot 
be unreasonably large. In another paper [I], Boneh and Franklin mentioned that 
it may suffice for /c to be a fairly small integer, e.g. on the order of 20, but this is 
applicable to the traitor tracing scheme. Since our goal is different from theirs, 
we may use larger k if necessary, depending on the application, for example in 
some cases k = 100 may be sufficient. 

Further, our scheme may be even practical in some circumstances. For in- 
stance, in a company or organization whereby the total number of users is small 
and higher security level is of paramount importance, our scheme is preferred 
to the BF scheme since our scheme is more reliable in that it is provably secure 
in the standard model under the well-known DDH assumption (as compared to 
the BF scheme which is proven secure in the random oracle model based on 
the much less analyzed BDH assumption). Specifically, our scheme can provide 
optimum security in the particular scenario wherein the total number of users n 
is less than or equal to k. 

The efficiency of our scheme is linear in k and it is independent of the total 
number of users n. In other words, there exists trade-off between the efficiency 
of our scheme and the resilience k, hence the security level. 

Related Work. Independently, Dodis et al. showed key-insulated encryption 
schemes [10], where their schemes coincide with our schemes. However, [10] does 
not present any formal definition nor security proof on ID-based encryption. 

The rest of the paper is organized as follows. Some preliminaries such as 
basic facts, definitions and security models are given in Section 2. In Section 
3, we present our proposed fc-resilient IND-CPA schemes in the non-adaptive 
and adaptive settings. In Section 4, a fc-resilient adaptive IND-CCA scheme is 
presented. Finally, some concluding remarks are made in Section 5. 

2 Preliminaries 

Lagrange Interpolation. Let g be a prime and f{x) a polynomial of degree k 
in Zq; let jo , . . . , jfc be distinct elements in Zq, and let /o = /(jo), ■ ■ ■ , fk = f{jk)- 
Using Lagrange Interpolation, we can express the polynomial as 
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f{x) ‘^= where Xt{x) ‘^= Uo<^^t<k are 

the Lagrange coefficients. 

DDH Assumption. The security of our schemes will rely on the Decisional 
Diffie-Hellman (DDH) assumption in a group G: namely, it is computationally 
hard to distinguish a quadruplet R = (51, 52, ui, ^2) of four independent ele- 
ments in G from a quadruplet D = {gi,g2, mi, M2) satisfying logg^^ mi = logg^ M2. 

Collision- Resistant Hash Function. A family of hash functions is said to be 
collision resistance if given a randomly chosen hash function H from the family, 
it is infeasible for an adversary to find two distinct messages m and m! such that 
H{m) = H{m'). 

IBE Scheme. An identity-based encryption scheme IBE is specified by four 
polynomially bounded algorithms: Setup, Extract, Encrypt, Decrypt where: 

Setup: a probabilistic algorithm used by the PKG to set up all the parameters 
of the scheme. The Setup algorithm takes as input a security parameter 
and a number k (i.e. the maximum number of users that can be corrupted) 
and generates the global system parameters params and master-key. The 
system parameters will be publicly known while the master-key will be known 
to the PKG only. 

Extract: a probabilistic algorithm used by the PKG to extract a private key 
corresponding to a given public identity. The Extract algorithm receives as 
input the master-key and a public identity ID associated with the user; it 
returns the user’s private key SK\d- 

Encrypt: a probabilistic algorithm used to encrypt a message m using a public 
identity ID. The Encrypt algorithm takes as input the system parameters 
params, a public identity ID and a message m and returns the ciphertext G. 

Decrypt: a deterministic algorithm that takes as input the system parameters 
params, the private key SK\q and the ciphertext C and returns the message 
m. We require that for all messages m, Decrypt(params, S'Aid, G) = m where 
G = Encrypt(params, ID, to). 

Security. Ghosen ciphertext security (IND-GGA) is the strongest notion of 
security for a public key encryption scheme. Hence, it is desirable to devise 
an IND-GGA secure IBE scheme. However, the definition of chosen ciphertext 
security in an identity-based system must be strengthened a bit for the following 
reason. When an adversary attacks a public identity ID, she might already 
possess the private keys of users IDi, ID2, . . . , ID^ of her choice. We refer to 
these users as corrupt users. Hence, the definition of IND-GGA must allow the 
adversary to issue a maximum of k private key extraction queries adaptively. 
That is, the adversary is permitted to obtain the private keys associated with 
a maximum of k public identities of her choice adaptively (other than the 
public identity ID being attacked). Another difference is that the adversary 
is challenged on a public identity ID of her choice as opposed to a random 
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public key. The two amendments apply to adaptive IND-CPA definition as well 
based on the same reasoning. We give the attack scenarios for IND-CPA and 
IND-CCA as follows: 

IND-CPA: First, Setup is run and the adversary A is given the system param- 
eters params. Then, A enters the private key extraction query stage, where she 
is given oracle access to the extraction oracle. This oracle receives as input the 
public identity ID^ and returns the corresponding private key SKi. This oracle 
can be called adaptively for at most k times. 

In the second stage, A can query the encryption oracle (also known as 
left-or-right oracle) on any pair of messages mg, m-i and an identity ID on which 
it wishes to be challenged. ^ Then, a is chosen at random from {0,1} and the en- 
cryption oracle returns the challenge ciphertext C* = Encrypt(params, ID, mo-). 
Without loss of generality, we can assume that the encryption oracle is called 
exactly once. At the end of this stage, A outputs a bit a* which she thinks is 
equal to a. Define the advantage of A as Adv|g°(}^^^(A) := | Pr[cr* = a] — 

Note: For the non-adaptive IND-CPA security, we must assume that the 
adversary has successfully corrupted the maximum of k users and thus obtained 
the k corresponding private keys before Setup takes place i.e. before the ad- 
versary learns the system parameters params. This is a weaker notion of security. 

IND-CCA: The attack scenario is almost the same as that in the adaptive IND- 
CPA, except that now A has also access to the decryption oracle, which she can 
query on any pair (ID. ,Ci) of her choice. A can call this oracle at any point during 
the execution, both in the first and in the second stage, arbitrarily interleaved 
with her other oracle calls. To prevent the adversary from directly decrypting 
her challenge ciphertext C*, the adversary is disallowed to query the decryption 
oracle on the pair (ID,C*) which is the output from the encryption oracle (i.e. 
(IDijCj) yf (ID,^*)). As before, we define the advantage as Adv|gg^'''"'*^(A) := 
|Pr[a* = a]-i|. 

Definition 1. (k-resilience of an IBE Scheme) 

Let fj, G {IND-CPA, IND-CCA}. We say that an IBE scheme is k-resilient 
against a n-type attack if the advantage, Adv[^g^(A), of any probabilistic 
polynomial-time (PPT) algorithm A is a negligible function of X. 

Before continuing, we state the following useful lemma which would be re- 
ferred later. 

Lemma 1. Let Ui,U 2 and F be events defined on some probability space. Sup- 
pose that fUi A -<F) and {U 2 A -■F) are equivalent events, then \ Pr[[/i] — 
Pr[C/ 2 ]| < Pr[F]. 

^ For the sake of generality, we could have allowed A to interleave the calls to the 
extraction oracle and the encryption oracle. However, this definition is equivalent to 
the one we present. 
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3 fc-Resilient IND-CPA Scheme 

In this section, we present two IBE schemes, a basic scheme which is /c-resilience 
in a non-adaptive setting of an IND-CPA attack and an adaptive IND-CPA 
scheme. Subsequent schemes can be build on the previous one, in an incremental 
way, so that it is possible to obtain increasing security at the cost of slight 
efficiency loss. 

3.1 Non-adaptive IND-CPA Scheme (Basic Scheme) 

First we describe the basic scheme achieving semantically secure against chosen 
plaintext attack, assuming DDH problem is hard in the group G. This scheme 
is /c-resilience in a non-adaptive setting. 

Setup: Given a security parameter 1 ^ and k, the algorithm works as follows. 
The first step is to define a multiplicative group G of prime order q such 
that p = 2q + 1 is also prime, in which DDH is believed to hold. This is 
accomplished selecting a random prime q with the above two properties and 
a random element g of order q modulo p. The group G is then set to be the 
subgroup of Z* generated by g, i.e. G = {g* mod p : i £ Zg} C Z*. Then, a 

random fc-degree polynomial f{x) X)t=o chosen over Zg. Finally, 

the algorithm publicizes the system parameters params = {g, g^^^ , . . . , g‘^'‘). 
The master-key is f{x) which is known to the PKG only. 

Extract: For a given public identity ID G Zg, the algorithm computes /id = 

./(ID). 

Encrypt: To encrypt a message m G G under the public identity ID, the algo- 
rithm computes T>\d = Next, it selects r G Zg randomly 

and set the ciphertext as C = (g’’, 

Decrypt: Let G = (ci, C2) be a ciphertext encrypted using the public identity 
ID. To decrypt G using the private key /id, the algorithm computes m = 
C 2 /c('“. 

Recall that D = (51,52,51,52) and R = (51, 52, 5 i , 52) where 51,52 are gen- 
erators and a,b and r are randomly chosen over Zg. In the proof of the fol- 

lowing theorem, by Lagrange Interpolation, f{x) can be expressed as fix) = 
'Et=oiftMx)), where ft = f(\Dt) and At(x) = Y{o<^^t<k = 0 , ■ • ■ 

Theorem 1. The above basie scheme is k-resilient against the non-adaptive 
chosen plaintext attacks (IND-CPA) under the DDH assumption. 

Proof. Suppose that the adversary A attacks our encryption algorithm success- 
fully in terms of non-adaptive IND-CPA security, we show that there is a PPT 
algorithm Ai that distinguishes D from R with a non-negligible advantage ei. 

Assume that A successfully corrupts up to k users of her choice and hence 
obtains the k corresponding private keys. Given the k private keys and the 
system parameters params = {g,g‘^°, . ■ . ,g‘^^), A finds two messages mo and mi 
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in G and outputs an identity ID on which it wishes to be challenged such that 
she can distinguish them by observing the ciphertext. 

Let {gi, g2,ui,U2) be the input of the DDH problem. The following algorithm 
Ai shall decide whether {gi,g2,u\,U2) is from D or R. 

1 . Choose k private keys fi, . . . , fk at random corresponding to the k cor- 
rupt users’ public identities IDi,...,IDfc. Let g = gi,g'^° = (?2- Compute 
g '^^ , . . . , g'^*‘ as follows. Notice that, fi, ■ ■ ■ , fk can be written in the matrix 
form as follows: 



(h\ 

/2 


_ 


. . ^ 
0 0 


+ 


/IDi ID? . 
ID2 ID? • 


•ID?\ 
• ID? 




(di\ 

d 2 


v/J 




\do/ 




ViDfelD?. 


•ID?J 




\dk) 



M 



where matrix M is a Vandermonde matrix. It is clear that M is non-singular 
since IDi,...,IDfc are all distinct. Therefore, we have 

{di,...,dkf = -do,...,fk -do)'^. 

Let (6(1, . . . , btk) be the tth row of M~^. Then 



dt — 6(1 (/i — do) -I- • • • -I- btkifk — do) 

= 6(i/i -l- • • • -l- btkfk — (6(1 -l- • • • -l- btk)do- 



Hence, = g\*Gi+-+btufk i gb^ti+-+btk ^ t = l,2 ,...,fc. 

Let f'{x) = ft^ti.x) and f{x) = f{x) + doXo{x) where A((a;) is com- 

puted from IDq = 0 and IDi, . . . , ID^. Note that we do not know do = fo- 

2. Feed the private keys fi,. . . ,fk and the system parameters 

params = {gi,g2,g‘^^, ■ ■ ■ ,g‘^'‘) to A. A returns mo, mi £ G and an identity 
ID such that ID ^ {IDi, . . . , ID^}. 

3. Randomly select cr G {0, 1} and encrypt as G* = {ui,mcru{ 

4. Feed C* to A and get a return a*. The algorithm outputs 1 if and only if 



If {gi,g2,Ui,U2) is from D, g = gi,g2 = g‘^°,ui = g^,U2 = 52 = 5’’'^° 

and ^ = 

grfiD — j)r^ Thus, G* is the encryption of and Pr[Hi(5i, 52 , Mi, M2) = 1] = 
Pr[H(C'*) = a] = ^ + ei- Otherwise, since ui = 5“, M2 = 52> the distribution of 
G* is the same for both cr = 0 and cr = 1. Thus, Pr[Hi(5i, 52 , mi, M2) = 1] = 
Pr[H(C'*) = a] = \. Therefore, A\ distinguishes D from R with a non-negligible 
advantage ei. □ 
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3.2 Adaptive IND-CPA Scheme 

In this section, we present an adaptive IND-CPA secure IBE scheme with the 
condition that the adversary can corrupt up to a maximum of k users adaptively. 

For this scheme and the subsequent scheme, our proofs follow the structural 
approach advocated in [ 7 ] defining a sequence of attack games Go, Gi, . . . , G;, 
all operating under the same underlying probability space. Starting from Gq, we 
make slight modifications to the behavior of the oracles, thus changing the way 
the adversary’s view is computed, while maintaining the view’s distributions in- 
distinguishable among the games. We emphasize that the different games do not 
change the encryption algorithm (and decryption algorithm) but the encryption 
oracle (and decryption oracle) i.e. the method in which the challenge ciphertext 
is generated (and the ciphertext is decrypted) only. The actual encryption algo- 
rithm (and decryption algorithm) that the scheme (and hence the attacker) uses 
remains the same. 

For any 1 < i < I, let Ti be the event that a = a* in game G^. Our strategy 
is to show that for 1 < i < ^, the quantity | Pr[Ti] — Pr[Ti_i]| is negligible. Also, 
it will be evident from the definition of game G/ that Pr[T;] = which will 
imply that | Pr[To] — ^| is negligible. 

Setup: It is almost the same as in the basic scheme except that in this scheme 
we use two generators. This is accomplished selecting a random element gi of 
order q modulo p. The group G is set to be the subgroup of Z* generated by 
gi, i.e. G = {g{ mod p : i £ Zq} C Z*. A random w ^ yi Zq is then chosen 
and used to compute g2 = g^ ■ Then, two random ^-degree polynomials 

Pi{x) X)t=o Piix) chosen over Zq. Next, the 

algorithm computes Dq = gf'^ g^'^ , . . . , Df^ = g'^’^g’^'^. Finally, it publicizes 
the system parameters as params = {gi, g2, Dq, . . . , Df^). The master-key is 
{pi,P2) which is known to the PKG only. 

Extract: For a given public identity ID £ Zq, the algorithm computes Pi,id = 
Pi(ID) and P2,ID = P2(ID) ^ and returns SKio = (pi,id,P2,id)- 

Encrypt: To encrypt a message m £ G under the public identity ID, the 
working steps of the encryption algorithm are given in Fig. 1 . 

Decrypt: To decrypt C using the private key SK\^ = (pijd,P2,id), the decryp- 
tion algorithm is depicted in Fig. 1 . 



Theorem 2. The above IBE scheme is k-resilient against adaptive chosen plain- 
text attacks (IND-CPA) under the DDH assumption. 

Proof. We shall define a sequence of “indistinguishable” modified attack games 
Go, Gi, G2 and G3 where Go is the original game and the last game clearly 
gives no advantage to the adversary. 

Game Go. In game Go, the adversary A receives the system parameters 
params = {gi, g 2 , Dq, . . . , D]f) and adaptively queries the extraction oracle for 

For conciseness, we will follow this notation throughout the paper. 



2 
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Encryption algorithm 


Decryption algorithm 


El. ri <— n Zq 
E2. Ml ^ 

F3. M2 ^ 

E^. 

F5. s ^ 

EQ. c m ■ s 
E7. C •(— (mi, M2, c) 


"FTi ] FTTTd P 2 TTD 

Dl. s-^Ui • U2 
D2. m <— c- 



Fig. 1. Encryption and decryption algorithms for the IND-CPA scheme 



a maximum of k public identities of her choice. Then, she outputs a challenge 
identity ID and queries the encryption oracle on (mo, mi). A receives the cipher- 
text C* as the answer. At this point, A outputs her guess cr* G {0, 1}. Let Tq be 
the event that cr = ct* in game Gq- 

Game Gi. Game Gi is identical to game Gq, except for a small modification to 
the encryption oracle. In game Gi, steps if 4 and if 5 of the encryption algorithm 
in Fig. 1 are replaced with the following step: 

E5'. 

It is clear that step if 5' computes the same value as step if 5. The point of this 
change is to make explicit any functional dependency of the above quantity on 
ui and U2- Let Ti be the event that ct = cr* in game Gi. Clearly, it holds that 
Pr[To] = Pr[Ti], 

Game G2. To turn game Gi into game G2, we make another change to the 
encryption oracle. We replace ifl and if 3 with the following: 

El'. ri •<— R Zq, ^2 •<— R Zq\{ri} 

if 3'. U 2 ^ 

Let T2 be the event that ct = ct* in game G2. Notice that while in game Gi the 
values Ml and M2 are obtained using the same value ri, in game G2 they are inde- 
pendent subject to ri yf r2. Therefore, using a standard reduction argument, any 
non-negligible difference in behaviour between Gi and G2 can be used to con- 
struct a PPT algorithm A\ that is able to distinguish Diffie-Hellman tuples from 
totally random tuples with non-negligible advantage. Hence | Pr[T2]— Pr[Ti]| < ci 
for some negligible ei. 

Game G3. In this game, we again modify the encryption oracle as follows: 

Ed', e <— Yi Zq, c gi 

Let T3 be the event that ct = ct* in game G3. Due to this last change, the 
challenge no longer contains ct, nor does any other information in the adversary’s 
view; therefore, we have that Pr[T3] = Moreover, we can prove that the 
adversary has the same chances to guess ct in both game G2 and G3, i.e. Pr[T3] = 
Pr[T2]. (The proof will be given in the full version of the paper.) 
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Finally, combining all the intermediate results, we can conclude that adver- 
sary A’s advantage is negligible; more precisely: Adv|gg^‘'^^(A) < Ci. □ 



4 fc-Resilient IND-CCA Scheme 

We present an identity-based encryption scheme achieving adaptive chosen ci- 
phertext security in this section. This scheme makes use of a hash function chosen 
randomly from a family of collision-resistant hash functions. Again, this adap- 
tive IND-CCA IBE scheme is secure against the adversary which can corrupt a 
maximum of k users adaptively. We give the description of the scheme as follows: 

Setup: As in the previous scheme, the first task is to select a random mul- 
tiplicative group G C Z* of prime order q and two generators gi,g2 G 
G. Then, six random k-degree polynomials are chosen over Zq. That is, 

fi{x) "*= J 2 t=o<^tx\hi{x) =■'' Y!l^Qhtx\h2{x) ‘^= 

Y!1=oKx\pi{x) ‘^= Y!l=odtX*' and p 2 {x) '^= Y!1 =o d-'tX*' ■ Next, the algo- 
rithm computes At = Dt = gf^g^*, for t = 0, . . . ,/c 

and chooses at random a hash function H from a family of collision-resistant 
hash functions. Finally, it publicizes the system parameters 

params = {gi,g2, Aq,..., A^, Bq,..., Bk,Do , . . . , H). 

The master-key is (/i, f2,hi, /i 2 ,Pi,P 2 ) which is known to the PKG only. 
Extract: For a given public identity ID G the algorithm returns 

SK\£) = (/l,|D, /2,ID, ^l,ID A2,ID,P1,ID,P2 ,Id)- 

Encrypt: To encrypt a message m G G under the public identity ID, the 
encryption algorithm works as depicted in Fig. 2. 

Decrypt: To decrypt G using the private key 

SK\o = (/i,id,/ 2 .id,^i,id/i 2 .id,Pi,id,P 2 Jd), the decryption algorithm is given 
in Fig. 2. 

Theorem 3. The above IBE scheme is k -resilient against adaptive chosen ci- 
phertext attacks (IND-CCA) suppose the DDH assumption holds for G and H 
is chosen from a family of collision-resistant hash functions. 

Proof. As in the proof of Theorem 2, we shall define a sequence of modified 
games Gi, for 0 < i < 5. Let T* be the event that cr = ct* in game G^. 

Game Gq. In game Gq, the adversary A receives the system parameters 
params = {gi, g2, Aq, . . . , Ak, Bq, . . . , Bk, Dq, . . . , Dk, H) and adaptively inter- 
leaves queries to the extraction oracle and the decryption oracle. For private 
key extraction query, A inputs the public identity ID^ of her choice while for 
the decryption query, A provides the oracle with the public identity and ci- 
phertext pair {\Di,Ci) of her choice. A can query the extraction oracle for a 
maximum of k times adaptively. Then, A outputs a challenge identity ID and 
queries the encryption oracle on (mo, mi). She receives the ciphertext C* as the 
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Encryption algorithm 


Decryption algorithm 


El. ri R Zq 

E2. Ml 
E3. U2 ^ 

E4. 

^^o^UloDf 

E5. s ^ 

E6. c m . s 
El. a H{ui,U 2 , c) 
E8. mid ^ 

E9. C ^ (mi,M2,c, Mid) 


Dl. a <— H(ui,U 2 ,c) 

L>2. lest li Mid Ml • W 2 ! 

halt if this is not the case 

no , PI, ID P2,ID 

D.i. s Mj • Mj 

D4. m c- s“^ 



Fig. 2. Encryption and decryption algorithms for the IND-CCA scheme 



answer. Next, A can again query the decryption oracle, restricted only in that 
(ID., Ci) ^ (ID, C*). Finally, A outputs her guess a* G {0, 1}. 

Game Gi. Game Gi is identical to game Go, except for a small modification to 
the encryption oracle. In game Gi, steps E4 and E5 of the encryption algorithm 
in Fig. 2 are replaced with step E5' and step E8 of the encryption algorithm in 
Fig. 2 is replaced with step E8' as follows: 



E5'.s^ uf 
E8'. V|D ^ u{ 



P2,\D 



U2 

/l,ID + ^l,IDQ: 



/2,ID + ^2,IDQ: 

U2 



It is clear that steps E5' and E8' compute the same values as steps E5 and E8 
respectively. The point of these changes is just to make explicit any functional 
dependency of the above quantities on ui and U 2 - Clearly, it holds that Pr[To] = 
Pr[Ti]. 

Game G 2 . To turn game Gi into game G 2 , we make another change to the 
encryption oracle. We replace El and E8 with the following: 

El'. ri ^ R Zq, r 2 ^ R Zq\{ri} 

E8'. U2 ^ 92^ 



Notice that while in game Gi the values ui and U 2 are obtained using the same 
value ri, in game G 2 they are independent subject to r\ yf r 2 . Therefore, using 
a standard reduction argument, any non-negligible difference in behaviour be- 
tween Gi and G 2 can be used to construct a PPT algorithm Ax that is able to 
distinguish Diffie-Hellman tuples from totally random tuples with non-negligible 
advantage. Hence |Pr[T 2 ] — Pr[Ti]| < ei for some negligible 
Game G 3 . To define game G 3 , we slightly modify the decryption oracle, replac- 
ing steps D2 and D8 with: 

D2'. Test if U 2 = uf and Uid = 
halt if this is not the case 
D3'. s ^ 
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Let i?3 be the event that, some decryption query that would have passed the test 
in step D 2 used in game G2 fails the test in step D 2 ' in game G3. Obviously, G2 
and G3 are identical until event occurs. In particular, the events (T2 h-'R^) 
and (T3 A -^R^) are identical. By Lemma 1, we have | Pr[T3] — Pr[T2] | < Pr[i?3]. 

We introduce two more games, G4 and G5 in order to bound Pr[i?3]. 

Game G4. In this game, we again modify the encryption oracle as follows: 

E(S' . e <— Yi Zq, c gi 

Due to this change, the challenge no longer contains a, nor does any other 
information in the adversary’s view; therefore, we have that Pr[T4] = f . 

Let i?4 be the event that some decryption query that would have passed the 
test in step D 2 used in game G2 fails the test in step D 2 ' in game G4. We show 
that those events happen with the same probability as the corresponding events 
of game G3. More precisely, we prove that Pr[T4] = Pr[P3] and Pr[i?4] = Pr[i?3]. 
(The proof will be given in the full version of the paper.) 

Game G5. This game is the same as game G4, except for the following modifi- 
cation. We modify the decryption oracle so that it applies the following special 
rejection rule, whose goal is to prevent the adversary from submitting illegal 
ciphertexts to the decryption oracle after she has received the challenge C*. 

Special rejection rule: After A receives her challenge C* = (u*, c*, U|*q), the 

decryption oracle rejects any query {\Di,Ci), with Ci = (ui,U2,c,Vi) such 
that (ui,U2,c) ^ {u\,U2,c*) but a = a*. It does so before executing step 
D 2 '. 

Let C5 be the event that the adversary submits a decryption query that is 
rejected using the special rejection rule. Let be the event that A submits some 
decryption query that would have passed the test in step D 2 used in game G2, 
but fails the test in step D 2 ' used in game G5 . Clearly, G4 and G5 are identical 
until event C5 occurs. In particular, the events {R4 A-iGs) and {R5 A-iGs) are 
identical. By Lemma 1, we have | Pr[i?5] — Pr[i?4]| <Pr[Gs]. 

We need to show that events C5 and R5 occur with negligible probability. 
The argument to bound event G5 is based on the collision-resistant assumption. 
Using a standard reduction argument, we can construct a PPT algorithm A2 that 
breaks the collision-resistant assumption with non-negligible advantage. Hence, 
we have Pr[C5] < €2 for some negligible £2- Subsequently, we show that event 
occurs with negligible probability purely based on some information-theoretic 
considerations. That is, Pr[i?s] < Q^(A)/g, where Qa{Z) is an upper bound on 
the number of decryption queries made by the adversary. (The detailed proof 
will be given in the full version of the paper.) 

Finally, combining all the intermediate results, we can conclude that ad- 
versary A’s advantage is negligible; more precisely: Advjgg(4''^'^(A) < £i -I- £2 -I- 

QaW/q- ' □ 
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5 Conclusion 

We proposed an adaptive IND-CCA secure IBE scheme based on the DDH as- 
sumption. The scheme is provably secure in the standard model assuming the 
adversary can corrupt up to a maximum of k users adaptively. We also presented 
a non-adaptive IND-CPA secure IBE scheme and an adaptive IND-CPA secure 
IBE scheme based on the same assumption. 
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Abstract. In an intrusion-resilient cryptosystem [10], two entities (a 
user and a base) jointly evolve a secret decryption key; this provides very 
strong protection against an active attacker who can break into the user 
and base repeatedly and even simultaneously. Recently, a construction of 
an intrusion-resilient public-key encryption scheme based on specific al- 
gebraic assumptions has been shown [6] . We generalize this previous work 
and present a more generic construction for intrusion-resilient public-key 
encryption from any forward-secure public-key encryption scheme satis- 
fying a certain homomorphic property. 



1 Introduction 

The exposure of secret keys can be a devastating attack against a cryptosystem. 
Especially when “standard” cryptanalytic techniques are infeasible, a determined 
attacker might find it much easier to obtain secret keys by hardware tampering, 
or via theft, bribery, or similar means. The problem of key exposure becomes 
more severe as cryptographic algorithms are increasingly used on inexpensive, 
lightweight, and portable consumer devices. 

Key evolution is a powerful defense against the threat of key exposure. As 
an example, in a forward-secure scheme one’s secret key is updated at each time 
period in such a way that key exposure during any time period compromises 
only future time periods (but not past time periods). Forward security was first 
formalized (in the context of signature and identification schemes) by Bellare 
and Miner [2], building on earlier ideas of Anderson [1]; numerous constructions 
of forward-secure signature schemes have been proposed (beginning with [2]). A 
forward-secure public-key encryption scheme has been constructed recently by 
Canetti, Halevi, and Katz [5]. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 81-98, 2004. 
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Key-insulated cryptosystems [7,3,8] extend the key evolution paradigm to 
further limit the damage from key exposure. As with forward security, the user 
(e.g., a mobile device) can perform all cryptographic operations during any par- 
ticular time period on his own. However, to update the user’s secret keys for 
the next time period, the user needs the help of a “base” (e.g., a desktop PC 
in the user’s home). Using this model, one may guarantee that exposure of the 
user’s keys during multiple time periods only compromises security for those 
specific time periods, and not for any other time periods either in the past or in 
the future. A key-insulated scheme is additionally termed “strong” if there is no 
security compromise when the adversary exposes the secrets stored on the base. 

Intrusion-resilience (first proposed in the context of signature schemes by 
Itkis and Reyzin [10]) is a synthesis of forward security and key-insulated se- 
curity. The system model is as in the key-insulated case: the user performs 
cryptographic operations on its own during each time period, and updates its 
key for the next time period with the help of the base. Here, however, a stronger 
security guarantee is provided. If the base and the user are exposed during the 
same time period, then all prior time periods remain secure (as in the case of 
forward security). Otherwise, repeated exposure of both the user and the base 
only compromise those specific time periods during which the user’s secret keys 
were exposed (as in the case of key-insulated security). 

The security provided by intrusion-resilient schemes may be further enhanced 
by allowing “refresh” operations between base and user in addition to “update” 
operations. Both of these are key-evolving functions. The difference is that an 
update operation is used only at the beginning of each time period, while any 
number of refresh operations can occur within a single time period. Someone 
who wants to interact with the user needs to know the current time period (i.e., 
number of update operations), but does not need to know how many refresh 
operations have occurred within each time period. Frequent refresh operations 
enhance security, since the attacker must expose user and base between refreshes 
in order to compromise future security. 

Itkis and Reyzin [10] gave a construction of intrusion-resilient signatures 
based on the strong RSA assumption. Subsequently, Itkis [9] showed a generic 
construction of intrusion-resilient signatures from any one-way function. The first 
construction for intrusion-resilient public-key encryption is given in [6]. That 
construction relies on a very specific assumption (the BDH assumption [4]), 
and is based on the forward-secure encryption scheme of [5]. This raises the 
natural question of what assumptions are sufficient to achieve intrusion-resilient 
encryption. In this paper, we make progress on this question by presenting a 
more generic construction for intrusion-resilient public key encryption based on 
any forward-secure encryption scheme satisfying certain properties. In this sense, 
our work generalizes the previous work [6] which constructs an intrusion-resilient 
encryption schemes from a specific forward-secure scheme (i.e., that of [5]). It 
is hoped that our more generic construction will highlight those properties that 
enable intrusion-resilience and thus shed additional light on this primitive. 
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Indeed, the scheme in [6] is somewhat complicated and hard to parse. In 
particular, one has to be extremely careful when defining the order of operations 
in that scheme, it is not immediately really clear what specific properties of the 
forward-secure scheme of [5] are critically used, and, overall, what is the high 
level intuition behind that construction. This paper tries to clarify this point by 
presenting a more generic construction of intrusion-resilient encryption which 
clearly explains which special properties of the scheme of [5] are used. Specifi- 
cally, we isolate two such crucial properties: a homomorphic structure of the key 
updating operation, and, more importantly, “separability” between the user’s 
key material used for updating from that used for the actual decryption. Indeed, 
we will argue that without such separability it seems impossible (or very hard) 
to build an intrusion-resilient encryption scheme from a forward-secure scheme. 
For that reason, we also give a new, refined definition of forward-secure encryp- 
tion which explicitly models this key separability, and argue that the scheme 
in [5] meets our definition. Then, we give a clean and intuitive construction 
of intrusion-resilient encryption from any such refined forward-secure encryp- 
tion with an extra homomorphic property for key updating. Of course, since 
presently there exists only one specific forward-secure encryption of [5], we can 
currently instantiate our scheme in only one way — the one given in [6] — but 
our exposition hopefully clarifies and explains the design criteria for constructing 
intrusion-resilient encryption. In particular, shall a new forward-secure scheme 
be found, our construction pin-points the two natural extra properties which 
are needed to turn it into an intrusion-resilient scheme (from the same assump- 
tion) . And since we argue that such extra properties also seem to be necessary, 
our work motivates the design of future forward-secure schemes which satisfy 
them as well. Thus, we believe that our generally will clarify and simplify future 
designs of both forward-secure and intrusion-resilient schemes. 

As an additional contribution, we explore a number of alternative models 
and definitions for both forward-secure and intrusion-resilient encryption in Sec- 
tions 2 and 3. Our generic construction appears in Section 4, and Section 5 
contains the proof of security. 

2 Forward-Secure Encryption 

Our intrusion-resilient scheme is built from a forward-secure encryption scheme. 
The notation and model borrows from that of [5], slightly adapted for our pur- 
poses. We let N be the set of positive integers and let [T] = {1,2, .. . ,T} for 
T G N. 

2.1 Functional Description 

We assume a key-evolving encryption scheme in which the user’s secret key can 
be “divided” into two components: an update key and a local key. An update key 
is used only to generate the update key and local key of the next time period, 
but is not used to decrypt a ciphertext. On the other hand, a local key is used 
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only to decrypt a ciphertext in the corresponding time period, but is not used 
to generate the update key or local key for the next time period. Note that the 
forward-secure encryption scheme of [5] may be viewed in this way. 

More formally, we specify a key-evolving encryption scheme (with the above- 
mentioned property) by the following tuple of polynomial-time algorithms: 

fsKeyGen: key generation algorithm 

Input: security parameter k, number of time periods T 

Output: initial user key sko, public key pk 

fsKeyUpd: key-update algorithm 

Input: current user update key skt and time period t 

Output: next user update key skt+i and next user local key Iskt+i 

fsEnc: randomized encryption algorithm 

Input: user public key pk, current time period t, message M 

Output: ciphertext C 

fsDec: decryption algorithm 

Input: user local secret key Iskt, ciphertext C = fsEnc{pk,t,M) 

Output: message M 

The initial user update key sko is not actually used or stored (instead, f sKeyUpd 
is applied immediately to generate ski and Iski). Therefore, the sets of keys 
which an adversary can access are defined as follows: 

sk ={skt\l<t<T} and Isk = {lskt\l < t < T}. 

Remark: Note that in the definition of [5], a single secret key is used both for 
updates and for decryption (instead of having separate keys for updates and 
decryption, as above). We call such a scheme a primitive key-evolving scheme. A 
primitive scheme which is forward-secure is called a PFSE scheme, to distinguish 
it from forward-secure schemes which can additionally be cast as per the above 
definition (these are called FSE schemes). 

2.2 Definition of Security 

We now provide a definition of forward security for a key-evolving encryption 
scheme as defined in the previous section. Our definition is stronger than than 
the definition given in [5] in that we allow the adversary to obtain the local key 
(but not the update key) for time periods prior to the challenge time period. 
Formally, we accomplish this by giving the adversary access to two separate 
oracles: one of which returns local keys, and one of which returns update keys. 
Although this is a stronger definition than that given previously, note that the 
scheme of [5] satisfies it. 

Let A be a probabilistic polynomial-time oracle Turing machine, which gets 
input pk and T, and interacts with the following oracles: 

• Decryption oracle OpsDecisko, ■, ■), which on input t G [T] and a ciphertext 
C outputs a message M decrypted by Iskt (where this key is derived in the 
appropriate way from sko)- 
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• Update-key oracle OpSukeyisko, •), which on input t G [T] outputs skt (again, 
this key is derived in the appropriate way from sk^). 

• Local-key oracle Opsikeyisko, ■), which on input t G [T] outputs Iskt (again, 
this key is derived in the appropriate way from sko). 

• Left-or-right oracle OFSLR{pk,-,LRh{-, •)) which on inputs t* G [T] and equal- 
length messages mo, mi returns a challenge ciphertext C* G- f sEnc(pfc, U, mt,). 
The bit b is chosen randomly at the outset of the experiment. 

The adversary A may query all oracles adaptively, in any order it wants, 
subject to the following restrictions: queries t to Opsukey satisfy t > t*] queries 
t' to Opsikey satisfy t' yf t*; only a single query is made to Opslr', and the 
ciphertext C* received from Opslr may not be queried to Opsoec for time 
period t* . Eventually, the adversary guesses a bit b' and halts. The adversary 
succeeds if b' = b. We define the adversary’s advantage as the absolute value of 
the difference between its success probability and 1/2. 

Definition 1. We say that a key-evolving encryption scheme FSE is forward 
secure against chosen-ciphertext attacks /FS-CCA/ if the advantage of any ppt 
adversary A in the above experiment is negligible. 

Remark: We stress that separating the two oracles Opsukey and Opsikey 
strengthens the notion of forward security as compared to [5]. Specifically, our 
model allows an adversary to get the local key corresponding to any t' ^ t* . 

3 Intrusion-Resilient Encryption 

As mentioned in the introduction, intrusion-resilient encryption schemes achieve 
a stronger level of security than forward-secure encryption schemes, at the cost 
of introducing a second entity (i.e., the base). Our definition of security follows 
[10,6]. An adversary is allowed an adaptive chosen-ciphertext attack, can addi- 
tionally obtain the secrets from the base and/or the user, and can eavesdrop on 
the communication between the base and user. As long as the user, the base, 
and the communication between user and base are not compromised at the same 
time period, the scheme remains secure for all time periods at which the user’s 
key was not exposed. Furthermore, the scheme achieves forward security in case 
the user, base, and communication between user and base are compromised at 
the same time period. We now provide formal definitions. 



3.1 Functional Description 

The encryption scheme is specified by the following tuple of polynomial-time 
algorithms: 

KeyGen: key generation algorithm 

Input: security parameter k, number of time periods T, number of refreshes R 
Output: initial user key sfco.O) initial base key skbo,o, public key pk 
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BaseUpd: base key-update algorithm 
Input: current base key skbt.r 

Output: next base key skbt+i.o, key update message skut 

UserUpd: user key-update algorithm 

Input: current user key skt.r, key update message skut 

Output: next user key skt+i.o 

BaseRef: base key-refresh algorithm 

Input: current base key skbt.r 

Output: next base key skbt.r+i, corresponding key refresh message skrt.r 

UserRef: user key-refresh algorithm 

Input: current user key skt.r, key refresh message skrt.r 

Output: next user key skt.r+i 

Enc: randomized encryption algorithm 

Input: user public key pk, current time interval t, message M 
Output: ciphertext C 
Dec: decryption algorithm 

Input: user secret key skt.r, ciphertext C = Eac{pk,t, M) 

Output: message M 

The encryption scheme is run as follows: 

Syntactic(fc, T, R) 

Set {sko.o, skbo.o,pk) <r- KejGen{k,T, R) . 

For t = 0 to T — 1 : 

Set {skbt+i.o, skut) <r- BaseUpd(sfc6t.r) and sfct+i.o e- UserUpd(sfct.r, sfcrtt) . 

For r = 0 to R — 1 

Set (sfcfct.r+i, •sfcrt.j.) BaseRef (sfcfet.r) and sfct,r+i t-UserRef (sfct.r, . 

Here the keys sktp and skbt^ for 0 < t < T are not actually used or stored. Key 
generation is immediately followed by an update, and each update is immedi- 
ately followed by a refresh. Therefore, the secret keys which an adversary can 
potentially access are defined as follows: 

• sk* = {sfct,r.|l < t < T,1 < r < R} 

• skb* = {skbt.rl^ < t < T,l < r < i?} 

• sku* = {sfcutjl < t <T — 1} 

• skr* = {sfcrt, r|l <t<T — l,0<r<i? — 1}\ {s/cri.o} 



3.2 Definition of Security 

We now define intrusion-resilience. Let H be a probabilistic polynomial-time 
oracle Turing machine which gets input pk, T, and R, and which may query the 
following oracles (each oracle is technically indexed by an initial tuple of keys 
(sko.o, skbo.o,pk) which is omitted for readability): 

• Decryption oracle Ooec, which on input t G [T], r G [i?], and a ciphertext C 
outputs a message M decrypted using skt.r 
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• User key oracle Osk, which on input t G [T] and r G [i?] outputs skt.r 

• Base key oracle Obk, which on input t G [T] and r G [i?] outputs skbt.r 

• Key update oracle which on input t G [T] outputs skut 

• Key refresh oracle Or, which on input t G [T] and r G [i?] outputs skrt.r 

• Left-or-right oracle Olr, which on input t* G [T] and equal-length messages 
mo, mi, outputs challenge ciphertext C* <— Eac{pk,t* ,mb) (for a bit b which is 
chosen at random at the beginning of the experiment). 

The oracles Osk, Obk, and Or are generically called “key exposure oracles”, 
and are denoted by Ogee- Queries to a particular oracle are indicated by including 
the appropriate string; thus, Osec{“sk” ,t.r) denotes the query Osk(t,r). 

The only restrictions for the adversary’s queries are that key exposures must 
respect erasure. That is, if a value corresponding to a particular instant in time 
t\ has been obtained by the adversary (via an oracle query), then a value corre- 
sponding to a prior instant in time (which would have been erased prior to ti) 
cannot be obtained. More formally, 

o (“sk”, t.r) must be queried before (“sk”, t'.r') if t' > t or t' = t and r' > r; 
o (“bk”, t.r) must be queried before (“bk”, t'.r') if t' > t or t' = t and r' > r; 
o (“bk”, t.r) must be queried before (“r”, t'.r') if t' > t or t' = t and r' > r; 
o (“bk”, t.r) must be queried before (“u”, t') if t' > t. 

For a set Q of key exposure queries, we say that skt.r is Q-exposed if one of the 
following is true: 

— (“sk”, t.r) G <5; 

— r > 1, (“r”, t.{r — 1)) G Q, and .skt.(r-i) is Q-exposed; 

~ r = 1, (“u”, t — 1) G Q, and .sk(^t-i).R is Q-exposed; 

— r < R, (“r”, t.r) G Q, and skt.r+i is Q-exposed. 

A completely analogous definition may be given for Q-exposure of a base key 
skbt.r. We say the scheme is (t*, Q)- compromised if skt^.r is Q-exposed (for some 
r), or if both skt'.r and skbt.r are Q-exposed (for some r and t' < t*). 

We say that an adversary succeeds if it correctly guesses the bit b used by the 
Olr oracle, subject to the following restrictions: (1) The system was not {t*,Q)~ 
compromised where Olr was queried at time period t*; and (2) The ciphertext 
C* returned by Olr was not queried to Ooec (for the same time period t*). An 
adversary’s advantage is defined as the absolute value of the difference between 
its success probability and 1/2. 

Definition 2. We say that an encryption scheme is intrusion-resilient against 
chosen-ciphertext attacks /IR-CCA/ if the advantage of any ppt adversary A in 
the above experiment is negligible. 

Remark: We sometimes refer to the notion defined above as “full” intrusion 
resilience. In Appendix A, we define a security notion called quasi-intrusion 
resilience which lies “in between” key-insulated security and full intrusion- 
resilience. This intermediate notion helps describe the security level which is 
achieved by using a primitive key-evolving encryption scheme. 
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4 A Generic Construction of Intrusion-Resilient 
Encryption 

In this section, we present a generic construction of a fully intrusion-resilient 
encryption scheme. 

4.1 Preparations 

Our idea is to extend a forward secure encryption scheme whose key-update 
algorithm is homomorphic in the sense we now describe. Assume a map 



(/) : Gi ^ G2 X G3, 

where Gi, G2, and G3 are groups represented additively. We say that the map 
(p is homomorphic if for all x,y G Gi we have: 

(p{x + y) = (j>{x) +(p{y). 



More precisely, p satisfies 



<j){x + y) = {xi +yi,X2 + 1 / 2 ) 

where (p{x) = {xi,X2) and (j){y) = (yi,j/2)- 

To give a generic construction of fully intrusion resilient scheme, we specify 
the key-evolving encryption scheme FSE as generally as possible. Let S\ be a set, 
and let G\, G2, and G3 be groups (written additively). Let FSE be as follows: 

FSE = (f sKeyGen, f sKeyUpd, fsEnc, f sDec): 

• fsKeyGen: {0, 1}* x N — >■ Gi x S'!; fsKeyGen(fc, T)=(sko, pk) 

• fsKeyUpd. : Gi — >■ Gi x G 2 ; f sKeyUpd(sfct) = (sfej+i, Isfct+i) 

Additionally, fsKeyUpd should be homomorphic; that is: 

f sKeyUpd(a; + y) = f sKeyUpd(a;) -f- f sKeyUpd(?/). 

In other words, it satisfies: 

f sKeyUpd(x + y) = (xi +yi,X2 + j/ 2 ), 

where fsKeyUpd(a;) = (a;i,a: 2 ) and fsKeyUpd(y) = (yi,y2)- 

• f sEnc : S'! X N X {0, 1}” — >■ {0, 1}"; f sEnc(pA:, t, M) = G 

• f sDec : G 2 x {0, 1}" — >■ {0, 1}”; tsV)ec{lskt^ G) = M 

4.2 Scheme Intuition 

As intuition for our construction, we may note that a secret key of encryption 
scheme FSE consists of skt and Iskt, where the local key Iskt is used only for 
decryption. We may notice that a user update key skt of FSE enables derivation 
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of all the user secret keys for periods t through N , but none of the secret keys for 
periods t' < t. This will allow us to achieve forward security, as in [5]. However, 
in our model we also need to divide the user update key between the user and 
the base, so that we can derive the sharing for period t + 1 from that of period 
t and achieve future security also. To achieve this, we let the user store Iskt — 
to enable decryption within the current time period — but additively share the 
user update key skt between the user and the base. In summary, let the user 
store Iskt and the evolved share of skt, and the base store the other evolved 
share of skt- Intuitively, Iskt by itself only allows the user to decrypt at period 
t, and the fact that the user update key skt is split ensures that exposure of the 
user cannot compromise any of the future periods. Security against compromises 
of the base follows similarly. This gives us intrusion-resilience. 

The only issue to resolve is how to update a local key by using the separated 
shares of an update key. Both shares of the user and the base are evolved in each 
time period, which are executed independently by the user and the base. When 
each share is evolved by using the key-update algorithm of FSE, the algorithm 
outputs two elements: the sharing of the next-time-period update key and the 
sharing of the next-time-period local key. The base sends only the sharing of a 
local key to the user as the update message, and the user combines it with his 
own sharing of the local key by using the homomorphic property of the key- 
update algorithm; thus, the user derives the the next-time-period local key. As 
a result, the user and the base generate their own update keys independently 
and compute the next-time-period local key jointly. This step is immediately 
followed by a random refresh. 

4.3 FISER 

We now describe the fully intrusion-resilient encryption scheme FISER = 
(KeyGen, BaseUpd, UserUpd, BaseRef, UserRef, Enc, Dec). Let us note that 
each parameter is defined on the following set or groups: 
o set of user public keys :Si 

o group of user secret keys :G 3 = Gi x G 2 

o group of base secret keys :Gi 

o group of key update message :Gs = Gi x G 2 

o group of key refresh message :G\ 

Using the above notation, each function is described as follows: 

KeyGen: {0, 1}* x N — >■ G 3 x Gi x Si; KeyGen(fc, T)={sko,o, skbo,o, pk) 

1. Compute {sko, pk) ^ fsKeyGen(fc, T). 

2. Let sko be divided in sko = sfcso.o + s^^o.o for randomly chosen sfcso.o G Gi 

3. Set pk = pk, sko,o = (sfcso.O) •)> sfc&o.o = skbo,o, 

4. Output sko o, skbo o, and pk. 

BaseUpd: Gi — >■ Gi x G2; BaseUpd(sfc&t,r) = {skbt+ 1 . 0 , skut) 

For an input of base secret key skbt.r = skbt.r 

1. Compute {skbt+i.o, skut) ^ fsKeyUpd{skbt.r). 

2. Output skbt+i.o = skbt+ 1.0 and skut- 
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UserUpd: G3 x G2 — >■ G3; UserUpd(sfci s/cwt) = skt+ 1.0 

For inputs of user secret key skt.r = {skst.r, Iskt) and update message skut 

1. Compute {skst+i.o,lskt+i) ^ f sKeyUpd(sfcst.r)• 

2. Compute Iskt+i = Iskt+i + skut- 

3. Output skt+ 1.0 = {skst+i.o,lskt+i). 

BaseRef: Gi — >■ Gi x Gi; BaseRef (sfc&t ,.) = (skbt.r+i, skrt.r) 

For an input of base secret key skbt.r = skbt.r, 

1. Compute skbt.r+i = skbt.r — Rt.r for a random secret Rt.r G G\. 

2. Output skht.r+i = skbt.r+i and skrt.r = Rt.r- 
UserRef: G3 x Gi — >■ G3; UserRef (sfci.j., s/cr* ,.) = skt.r+i 

For inputs of user secret key skt.r = {skst.r, Iskt) and refresh message 
skrt.r = Rt.i^ _ 

1. Compute skst.rJrl = skst.r + Rt.r- 

2. Output skt.rJrl = (skst.r+l^lskt)- 

Enc: S'2 X N X {0, 1}” — >■ {0, 1}"; Enc{pk, t, M) = C 
For inputs of a public key pk, time t, and a message M, 

1. Compute G fsEac{pk,t,M). 

2. Output G. 

Dec: G3 X {0, 1}" ^ {0, 1}”; Dec{skst, C) = M_ 

For inputs of user secret key skt.r = {skst.r, Iskt) and ciphertext G, 

1. Compute M •<— f sDec(/fcst, G). 

2. Output M. 

5 Security Analysis 

We now prove security of the FIBER given above. For simplicity, the time com- 
plexity of an adversary A is defined as the execution time of the experiment 
used to define the advantage of A, including the time taken for key generation 
and initialization, as well as the time required for the various oracles to compute 
replies to the adversary’s queries. 

Theorem 1. Let A be an adversary of time eomplexity r with at most Q queries 
to oracles O € {O/jec, Ogee, against FIBER. If A has advantage 5, then 

there exists an adversary B performing a chosen-ciphertext attack against the 
underlying FBE with at least the same advantage. The time complexity of B is at 
most T -F 0(log k), and the number of queries is at most Q. 

Proof. We construct an adversary B that uses A to perform a chosen-ciphertext- 
and-key attack against FBE. B is allowed to ask queries to: a decryption oracle 
OpSDeci'jPk, skt.r, •)! ^ ^ser update-key oracle Opsukeyipk, sko, •); a user local- 
key oracle Opsikeyfpk, sko, Oi a left-or-right oracle OpsLuipk, •, LR(-, -, b)). 
Adversary B receives challenge ciphertext G* = fsEnc (pfc, C, mt,), and outputs 
a guess b' . Adversary B succeeds if 6' = 6. 

B simulates A’s environment as follows: first, B runs A until A outputs T and 
R GN. B also returns T. B runs f sKeyGen(fc, T) to produce (sko,pk). B chooses 
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sfc&o.o G Gi randomly and maintains a list which consists of tuples of the 
following form: 

(t, r; skst.r, skbt.r, Rt.r) G N X N X Gi X Gi X Gi- 

We use the notation {t,r] skst.r,—,*) as follows: ” is used if there is no list 

on skbt.r, i-e. empty , and is used if we don’t care about Rt.r like empty 
or not, or if we maintain the data after some operating. For example, “change 
(t,r; *,*,—) to (t, r; *, *, means that: change the data to Rt.r while 

maintaining the data of skst.r and skbt.r as they are. 

To begin, B sets pk = pk and [/f = {(0, 0; — , skbo.o, — )} and continues the 

execution of A on input pk using its oracles to respond A’s queries as follows: 

Decryption oracle. Let a query to Ooeci', ’^pk, skt.r, •) be (t, r, G). B forwards 
(t, G) to its decryption oracle OpsDed'jPk, skg, •), and returns the answer M to 
A. From the definition of Ooec, the answer is exactly what A’s decryption oracle 
would have answered. 

Base key oracle. Let a query to Obk{skbo.o,pk,-,-) be (t,r). B conducts the 
following steps. 

1. If there is (t,r; *, skbt.r,*) in Ui^*, then pick skbt.r from 

2. Else if there is (t, r; skst.r, — , *) in , which means exactly “simultaneous 
attack” , then forward t to its user update- key oracle Opsukey, get the answer 
skt, compute 

skbt.r = skt — skst.r, 

and renew by using (t, r; skst.r, skbt.r, *) instead of {t, r; skst.r, — , *)• 

3. Else if r > 1 and there is {t, r — 1; *, skbt.r-i, Rt.r-i) in then compute 

skbt.r — skbt.r — 1 kit.r — 1 

and renew by using (t,r; *, skbt.r,*)- 

4. Otherwise, choose skbt.r G Gi randomly and renew using 

(t, r; -, skbt.r, *) in 

5. Finally B returns skbt.r to A. 

Since skbt.r was exactly what A’s base key oracle would have answered, A’s 
view is identical to its view in the attack against FISER. 

User key oracle. Let a query to Osk{sko.o,pk,-,-) be (t,r). B conducts the 
following steps. 

1. If there is {t,r-, skst.r,*,*) in then pick skst.r from 

2. Else if there is {t, r; —, skbt.r, *) in which means exactly “simultaneous 
attack”, then forward t to its user update key oracle Opsukey, get the answer 
skt, compute 

skst.r = skt — skbt.r, 

and renew by using (t, r; skst.r, skbt.r, *)• 
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3. Else if r > 1 and there is {t,r] skst.r-i, *,Rt.r-i) in then compute 

skst.r = skSt.r—1 + Rt.r — 1 

and renew by using (t, r; skst.r, —,*)■ 

4. Otherwise, choose skst.r G Gi randomly and renew using 

{t,r;skst.r,-,*)- 

5. Finally B returns skst.r to A. 

Since skst.r was exactly what A’s user key oracle would have answered, A’s 
view is identical to its view in the attack against FISER. 

Refresh oracle. Let a query to Or{skbo.o,pk,-r) be (t,r). B conducts the 
following steps. 

1. If there is {t,r] *,*,Rt.r) in then pick Rt.r from 

2. Else if either of the following are in 

{(t, r; skst.r, *, -), {t, r; skst.r+i, *, -)} 



or 



{(t, r; *, skbt.r, *), {t, r; *, skbt. r+l, *)}, 



then compute 

Rt.r ~ ^kSt.r-\-l — skSf.r Or Rt.r ~ skbt.r — skbt.r-\-\, 

and renew by using {t,r; skst.r,*, Rt.r) or skbt.r, Rt.r), respec- 

tively. 

3. Otherwise, choose Rt.r G Gi randomly and renew JJi^* using {t, r; *, *, Rt.r)- 

4. Finally B returns Rt.r to A. 

Since Rt.r was exactly what A’s refresh oracle would have answered, Gl’s view 
is identical to its view in the attack against FISER. 

Update oracle. Let a query to Ou{skbofi,pk,-) be t. B does as follows: 

1. If there is {t,R; *,skbt.R,*) in then compute 

(skbt+i.o,skut) ^ fsKeyUpd(sfc6t,fl). 

2. Else if there is {t, R; skst.R, —, *) in forward t-l- 1 to its local-key oracle 
Opsikeyipk, sko, ■), obtain the answer Iskt+i, and compute 

{skst+i.o,lskt+i) f sKeyUpd(sfcst.ij) and 
skut = Iskt+i — Iskt+i- 
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3. Otherwise, choose randomly skst M G Gi, forward f-l- 1 to its local- key oracle 

Opsikeyipk, sko, ■), obtain the answer Iskt+i, compute 

{skst+i.o,lskt+i) f sKeyUpd(sfcst.ij) and 

skut = Iskt+i — Iskt+ij 

and renew by using {t, r; skst,n, *). 

4. Finally B returns skut to A. 

Since skut was exactly what A’s update key oracle would have answered, A’s 
view is identical to its view in the attack against FISER. 

Left-or-right oracle. Let a query to OLn{pk, •, LR{-, •, b)) be mi). Then 

B forwards mi) to its left-or-right oracle Opshnipk, •, LR{-, •, b)), obtains 

a ciphertext G*, and returns G* to A. From the definition of Enc, the answer is 
exactly what A’s left-or-right oracle would have answered. 

When A outputs its guess bit b' and halts, B returns b' and halts. Note that even 
if A makes queries to more than one oracle O G Osec for the same time/refresh 
period (t,r), adversary A does not see any inconsistencies among the answers 
from these oracle queries unless the scheme becomes {t*, (5)-compromised (where 
Q represents the queries of A to Ogee up to and including that point in time); 
this assumes that A respects erasure. Furthermore, B queries Opsukey if and 
only if A queries both Ogk and Oyk for the same time/refresh period {t, r). That 
is, the earliest time period queried to both Ogk and Oyk simultaneously by A 
is coincident with that time period submitted to Opsukey by B. Therefore B 
succeeds whenever A does. 

From the above simulation by B, we see that the time complexity of B is at 
most T -I- fog k and that B makes at most Q queries to its oracles. 

Table 1. Abstraction of each security notion 





underlying notion 


achieved security level 


KIS[3] 


IBE 


key-insulated 


QISER 


PFSE -f IBE 


quasi-intrusion-resilient 


FISER 


FSE 


intrusion-resilient 



6 Further Discussion 

There are several security notions of key-evolving or key-updating encryption 
schemes: forward-secure encryption as defined by [5] (called PFSE), forward- 
secure encryption (FSE) as defined here (recall, in our model the secret key 
is split into a key used for decryption and a key used for updates), key-insulated 
encryption [7], and intrusion-resilient encryption [6]. These notions and the no- 
tion of ID-based encryption (IBE) [4] are related; this has already been noted in 
[7,3,8]. We summarize the relation here. 
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Any secure ID-based encryption scheme IBE with a certain homomorphic 
property can be transformed to achieve key-insulated security, following [3]d 
We denote this construction by KIS. Unfortunately, this scheme is insecure in 
case both user and base are corrupted (indeed, the scheme was not designed 
with this security property in mind). 

Our results shows that FSE with a certain homomorphic property is sufficient 
to achieve intrusion resilience. Then, we may raise the natural question as to 
whether a generic PFSE scheme can be transformed to achieve intrusion resilience. 
Unfortunately, the answer seems to be “no” in general (at least using a “simple” 
construction as shown here) even if we assume that the key-update algorithm 
is appropriately homomorphic. More formally, any PFSE scheme which can be 
converted in this way can actually be cast as an FSE scheme anyway. 

We briefly discuss why. Intuitively, both the user and the base must share 
the secret key of the PFSE scheme in order to achieve intrusion resilience. This 
requires that no single entity can have enough control to cause any security 
concerns. On the other hand, the user needs to decrypt a ciphertext. This indi- 
cates some separation between keys used for decryption and keys used for key 
updates. It would be interesting to formalize and rigorously prove the above 
informal reasoning. 

This may raise another question of what level of security is achieved by using 
PFSE. We show that any primitive forward secure encryption scheme together 
with any secure ID-based encryption scheme that satisfies a certain homomorphic 
property can be transformed to achieve quasi-intrusion-resilience in Appendix B . 
The construction is called QISER, and the definition of quasi-intrusion-resilience 
is given in Appendix A. These abstraction of each security notion is shown in 
Table 1. 

Remark: The Boneh-Franklin ID-based encryption scheme satisfies the necessary 
homomorphic property. Therefore, a forward-secure encryption scheme (e.g., [5]) 
combined with this IBE scheme satisfies quasi-intrusion-resilience. 
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A Definition of Quasi-intrusion Resilience 

We introduce the notion of quasi intrusion resilience, which lies “in-between” key- 
insulated security and intrusion resilience. Informally, the security obtained is as 
follows: corrupting both the base and the user at the same time period means 
that any period before the first user corruption is secure; otherwise, repeated 
exposure of the user and the base only compromises those specific time periods 
during which the user’s secret keys were exposed. 

We generalize the notion of (f* , Q)-compromise from Section 3.2 by con- 
sidering two disjoint scenarios, simultaneous and non- simultaneous corruption. 
We call a corruption simultaneous if both user and base were compromised 
for the same time period and refresh period; otherwise we call the corruption 
non- simultaneous. More formally, we say the scheme is (t*, Q)-simultaneous- 
compromised if one of the following is true: 

~ skf.r is Q-exposed (for some r); or 

— both skt'.r and skbt'.r are Q-exposed (for some t' and r). 

We say the scheme is (t*, Q)-non-simultaneous-compromised if: 

— skf.r is Q-exposed (for some r); or 

— both skt'.r and skbt'.r are Q-exposed (for some r and t' < t*); or 

— both skt'.r and skbp.r are never both Q-exposed (for any r and t' > t*). 

One can consider definitions in which (t* , Q)-simultaneous-compromise is disal- 
lowed, or in which (t* , Qj-non-simultaneous-compromise is disallowed. A scheme 
secure against any adversary who is not (t*, Q)-simultaneous-compromise is 
called non-simultaneous-compromise secure; the opposite case gives a system 
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which is simultaneous-compromise secure. Obviously, an encryption scheme is 
(fully) intrusion resilient if and only if it is both simultaneous-compromise and 
non-simultaneous-compromise secure. 

By slightly modifying the condition of (t*, Q)-non-simultaneous- 
compromised, we may define a system as (t*, (5)-quasi-non-simultaneous- 
compromised if: 

— skt.r is Q-exposed (for some r and t < t*); or 
~ both skt'.r and skbt'.r are Q-exposed (for some r and t' < t*); or 
~ both skt'.r and skbt'.r are never both Q-exposed (for any r and t' > t*). 

Let us define (t*, (5)-quasi-non-simultaneous-compromised and CCAl variation 
of intrusion-resilience as quasi-simultaneous-compromise secure; that is, the ad- 
versary does not query after receiving the challenge ciphertext c from Olr and 
the scheme is not (t*, (5)-quasi-non-simultaneous-compromised. The notion of 
quasi-intrusion-resilience is given as follows. 

Definition 3. We say that an encryption scheme is quasi-intrusion-resilient 
against chosen ciphertext attacks fQIR-CCAj if it is both quasi- simultaneous- 
compromise and non-simultaneous-compromise secure. 

We may note that the definition of quasi-intrusion-resilience is rather ad-hoc in 
the current version. 

B Generic Quasi-intrusion-Resilient Encryption 

B.l Preparations 

Our idea is to combine a primitive forward-secure encryption scheme with 
a secure ID-based encryption scheme, where key-extract algorithm has a 
homomorphic-like property. Let us define a homomorphic-like property of map, 

((. : Gi X S' ^ G2, 

where both Gi and G2 are groups and S is a set. The operation of Gi and G2 
is represented additively, and S does not have to be a group. Then, we say that 
the map (j) has a homomorphic-like property if for all si, S2 G Gi and all t G S 

4>{si -\- S 2 , t) = <l){si,t) -\- 4 >{ s 2 , t). 

Now we give a general construction of quasi-intrusion-resilient scheme. Let S2 
and S3 be sets, which are used in PFSE. We do not require any group property 
for PFSE. Let G4 and G5 be groups, which are used in IBE. The operations are 
represented additively. 

• PFSE = (pf sKeyGen, pf sKeyUpd, pf sEnc, pf sDec): 
o pfsKeyGen: {0, 1}* x N — >■ S3 x S2; pfsKeyGen(fc, T)={sko, pk) 
o pf sKeyUpd : S3 — >■ S3; pf sKeyUpd(sfct) = skt+i 
o pf sEnc : S2 X N X {0, 1}" — >■ {0, 1}"; pf sEnc(pfc, t, M) = C 
o pf sDec : S3 x {0, 1}” — >■ {0, 1}"; pf sDec(sfct, C) = M 
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ID-based encryption consists of key-generation, key-extraction, encryption, 
and decryption algorithms. 

• IBE = (iBKeyGen, IBKeyExt, IBEnc, IBDec): 
o IBKeyGen: {0,1}* — >■ G4 x G5; IBKeyGen(fc)=(so, pki) 

Input: security parameter k 

Output: master secret sq, public key pki 

o IBKeyExt: G4 x N — >■ G5; IBKeyExt ( sq, t) = iskt 

Input: user ID t and secret sq 

Output: user secret key iskt 

IBKeyExt has to satisfy a homomorphic-like property for G4 and G5: 

IBKeyExt(si -|- S2,t) = IBKeyExt(si, t) -|- IBKeyExt(s2, t). 

o IBEnc: G5 x N x {0,1}” {0,1}”; IBEnc (p/ci, t, M) = G 

Input: public key pki, user ID t, message M 
Output: cipher text G 

o IBDec: G5 x {0, 1}” -)> {0, 1}”; IBDec(fsfct, C) = M 
Input: user secret key iskt, ciphertext G = IBEnc(pfci, t, M) 

Output: message M 

B.2 QISER 

Let us describe the quasi-intrusion-secure encryption scheme QISER = (KeyGen, 
BaseUpd, UserUpd, BaseRef, UserRef, Enc, Dec). Here, user secret keys, user 
public keys, base secret keys, key update message, and key refresh message are 
defined on the following sets or groups: 
o set of user public keys : S'5 = 5'2 x G5 

o set of user secret keys : S'4 = 5's x G4 x G5 

o group of base secret keys : G4 
o group of key update message : Ge = G4 x G5 
o group of key refresh message : G4 

KeyGen: {0, 1}* x N — >■ 5*4 x G4 x S'5; KeyGen(/c, T)=(sfco,0) skbo,o, pk) 

For inputs of security parameter k and time T, 

1. Set {so,pki) ^ IBKeyGen(fc) and isko ^ IBKeyExt ( sq, 0). 

2. Let So be divided in sq = skso.o + skbo.o for randomly chosen s/csq.o G G4 

3. Compute (sfco, pk) pfsKeyGen(fc, T). 

4. Set pk = (pk,pki) and sfco.o = {sko, skso,o,isko). 

5. Output sko,o, skbo,o, and pk. 

BaseUpd: G4 — >■ G4 x Gq; BaseUpd(sfc&t,r) = (skbt+i.o, skut) 

For an input of base secret key skbt.r, 

1. Compute skbt+i.o = skbt.r — h for a random secret It G G4 

2. Compute Ut ^ IBKeyExt(sfc5t_|_i,o, t + 1). 

3. Output skbt+i.o and skut = {lt,ut). 

UserUpd: S4 x Go — >■ S4; UsexUpd^skt.r, skut) = skt+1.0 

For inputs of user secret key skt.r = {skt, skst.r,iskt) and update message 
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skut = {k,ut), 

1. Compute skst+i.o = skst.r + It- 

2. Compute iskt+i = IBKeyExt(sA:Si_|_i,o, t + 1) + Ut- 

3. Compute skt+i pf sKeyUpd(sA:t). 

4. Output skt+i.o = {skt+i,skst+i.o,iskt+i). 

BaseRef: G4 — >■ G4 x G4; BaseRef (sfc&t.r) = {skbt.r+i, skrt.r) 

For an input of base secret key skbt.r, 

1. Compute skbt.r+i = skbt.r — h.r for a random secret It.r G G4. 

2. Output skht.r+i and skrt.r = h.r- 

UserRef: 5'4 x G4 — >■ S'4; UserRef (sfct.r, sfcrt.r-) = skt.r+i 

For inputs of user secret key skt.r = {skt, skst.r,iskt) and refresh message 
skrt.r, 

1. Compute skst.r+i = skst.r + skrt.r- 

2. Output skt.r+i = (skt, skst.r+ijiskt)- 

Enc: S '5 X N X {0, 1}” — >■ {0, 1}"; Enc(p/c, t, M) = C 
For inputs of a public key pk, time t, and a message M, 

1. Compute G ^ IBEnc(p/ci, t, pf sEnc(pfc, t, M)). 

2. Output G. 

Dec: S '4 X {0, 1}” {0, 1}”; Dec{skt.r, C)_= M 

For inputs of user secret key skt.r = {skt, skst.r, i skt.r) and a ciphertext G, 

1. Compute M •<— pf sDec(sfct, IBDec(isfct.r, G)). 

2. Output M. 



B.3 Security Analysis 

The following theorems will be proved in the final version. 

Theorem 2. Let A be an adversary of time complexity r with at most Q queries 
to oracles O € {Odsc, Ogee, Olr} against QISER. If A has non-negligible ad- 
vantage under non- simultaneous compromise, then there exists an adversary B 
performing a chosen ciphertext attack against the underlying IBE with at least 
the same advantage. The time complexity of B is at most t -\- 0{logk) , and the 
number of queries is at most Q. 

Theorem 3. Let A be an adversary of time complexity r with at most Q queries 
to oracles O € {OoecOsecOLR} against QISER. If A has non-negligible advan- 
tage under quasi- simultaneous -compromise, then there exists an adversary B 
performing a chosen ciphertext attack against the underlying PFSE with at least 
the same advantage. The time complexity of B is at most t -\- 0{logk) , and the 
number of queries is at most Q. 
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Abstract. In this paper, we propose the security notion of certificate- 
based signature that uses the same parameters and certificate revocation 
strategy as the encryption scheme presented at Eurocrypt 2003 by Gen- 
try. Certificate-based signature preserves advantages of certificate-based 
encryption, such as implicit certification and no private key escrow. We 
present concrete certificate-based signature schemes derived from pair- 
ings on elliptic curves and prove their security in the random oracle 
model assuming that the underlying group is GDH. Additionally, we pro- 
pose a concrete delegation-by-certificate proxy signature scheme which is 
derived from a certificate-based signature scheme after simple modifica- 
tions. Our proxy scheme is provably secure in the random oracle model 
under the security notion defined by Boldyreva, Palacio and Warinschi. 



1 Introduction 

1.1 Certificate-Based Cryptosystem 

In traditional public key signatures (PKS), the public key of the signer is essen- 
tially a random bit string picked from a given set. So, the signature does not 
provide the authorization of the signer by itself. This problem can be solved via 
a certificate which provides an unforgeable and trusted link between the public 
key and the identity of the signer by the GA’s signature. And there is a hierar- 
chical framework that is called by public key infrastructure (PKI) to issue and 
manage certificates. In general, the signer registers its own public key with its 
identity in certificate server and anyone wishing to obtain the signer’s public key 
requests it by sending the server the identity of the signer and gets it. Before 
verifying a signature using the signer’s public key, however, a verifier must obtain 
the signer’s certification status, hence in general make a query on the signer’s 
certificate status to the GA. It is called by third-party query. As mentioned in 
[11], even though the third-party query has some problems in public- key en- 
cryptions, those problems can be surmounted in signature schemes simply by 
transmitting the certificate for its valid public key with its signature. Despite of 
this settlement, a verifier must verify the certificate first and if authorization of 
the GA about the signer’s public key is valid then verifies the signed message 
with given public key from the signer. In the point of a verifier, two verification 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 99-111, 2004. 
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steps for independent signatures are needed. As a consequence, this system re- 
quires a large amount of computing time and storage when the number of users 
increases rapidly. 

To simplify key management procedures of conventional PKIs, Shamir asked 
for ID-based cryptography (IBC) in 1984 [18], but recently Boneh and Franklin 
[2] proposed a practical ID-based encryption (IBE) scheme based on bilinear 
maps. Subsequently, several ID-based signature (IBS) schemes which share sys- 
tem parameters with the IBE scheme of Boneh and Franklin are proposed [17, 
12,8]. The main practical benefit of IBC is in greatly reducing the need for, and 
reliance on, the public key certificates. But IBC uses a trusted third party called 
a Private Key Generator (PKG). The PKG generates the secret keys of all of 
its users, so a user can decrypt only if the PKG has given a secret key to it (so, 
certification is implicit), hence reduces the amount of storage and computation. 
On the other hand, private key escrow is inherent and secret keys must be sent 
over secure channels, making private key distribution difficult [11]. 

To import several merits of IBC into conventional PKIs, the concept of 
certificate-based encryption (CBE) was introduced by Gentry [11]. A CBE 
scheme which is created by combining a public key encryption (PKE) scheme 
and an IBE scheme consists of a certifier and users. Each user generates its 
own secret and public key pair and requests a certificate from the CA, then the 
CA uses the user private key generation algorithm in the Boneh-Franklin IBE 
scheme to generate certificates. That gives us implicit certification by virtue of 
the fact that a certificate can be used as a signing key, and so allows to eliminate 
third-party queries on certificate status. But this ordinary GBE scheme is ineffi- 
cient when the GA has a large number of users and performs frequent certificate 
updates, so Gentry suggests to use subset covers to overcome inefficiency [11]. 

1.2 Our Contributions 

We give the first construction of a certificate-based signature (GBS) scheme 
that can use the same parameters and certificate revocation strategy as the 
GBE scheme of [11]. In Section 2, we define a formal security model of GBS, and 
describe two similar pairing-based GBS schemes which are secure in the random 
oracle model in Section 3. It is obvious that our schemes maintain most of the 
advantages of GBE over PKE and IBE. Both of these schemes use an IBS scheme 
for signing phase and the BLS signature scheme [4] for certificate issuing phase, 
but one scheme uses multisignatures, while the other does aggregate signatures 
as a temporary signing key, which provide implicit certification. Since the GA 
does not know user’s personal secret key, GBS does not suffer from the key 
escrow property which is inherent in IBG and since the GA’s certificate need not 
be kept secret, there is no secret key distribution problem. 

We show in Section 4 that a delegation-by-certificate proxy signature scheme 
immediately follows. A proxy signature permits an entity to delegate its signing 
rights to another entity. The basic model of proxy signature schemes is that 
the original signer creates a signature on delegation information and gives it 
to the proxy signer, and then the proxy signer uses it to generate a proxy key 
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pair. That is analogous to the certificate issuing and temporary signing key 
generation phases in CBS. Based on this fact, we make slight modifications to 
our CBS scheme, and prove that the resulting proxy signature scheme is secure 
in the random oracle model, assuming that the underlying group is GDH. 



1.3 Related Works 

The general notion of self-certified signatures (SCS) proposed by Lee and Kim 
[16] is that a signer computes a temporary signing key with its secret key and 
certification information together, and generates a signature on a message and 
certificate information using the temporary signing key. Then a verifier verifies 
both signer’s signature on the message and related certification information to- 
gether. We can easily see that both SCS and CBS provide the authenticity of 
a digital signature and the authorization of a public key simultaneously. But 
there are some different aspects between SCS and CBS. The former does not 
concern the certificate revocation problem which is the main contribution of the 
latter. It only specifies how to sign a message and verify a signature using a long- 
lived key pair and the corresponding certificate together, and provides explicit 
authentication of a public key. 

The notion of certificateless public key signature (CL-PKS) presented by Al- 
Riyami and Paterson [1] does not require the use of certificates. In CL-PKS, the 
Key Generation Center (KGC) supplies an user with a partial secret key which 
the KGC computes from the user’s identity and a master key, and then the 
user combines its partial secret key and the KGC’s public parameters with some 
secret information to generate its actual secret key and public key respectively. 
In this way, an user’s secret key is not available to the KGC, whereas the KGC 
must send the partial secret keys over secure channels. In this case, it is assumed 
that the KGC is trusted not to replace users’ public keys because a new public 
key could have been created by the KGC and it cannot be easily decided which 
is the case. This rather strong security assumption can be reduced by a slight 
modification that an user must first generate its public key and then bind it 
with its identity as the new identity of the user. The user sends it to the KGC to 
generate a partial secret key. This technique makes the trust level of the KGC 
apparent and equivalent to that of the CA in conventional PKIs. Independently, 
Chen, Zhang and Kim [9] apply the same idea of above modification to the IBS 
scheme of Cha and Cheon [8]. Although these schemes remove the key escrow 
property, they still require secure channels and are less efficient than our CBS 
scheme. 

2 Preliminaries 

In this section, we review some definitions and provide a formal security model 
necessary to build our signature scheme. We refer the reader to [2,4,10,13] for 
a discussion of how to build a concrete instance using supersingular curves and 
compute the bilinear map. 
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2.1 Cryptographic Assumptions 

Let Gi and G2 be two cyclic groups of some large prime order q. We view Gi as an 
additive group and G2 as a multiplicative group. A bilinear map e : Gi xGi — >■ G2 
between these two groups which is called admissible pairing must satisfy the 
following properties [2,11]: 

1. Bilinear: e(aQ, hR) = e{Q, for all Q, i? G Gi and all a, b &Z. 

2. Non-degenerate: e{Q,R) yf 1 for some Q, RG Gi. 

3. Computable: There is an efficient algorithm to compute e{Q, R) for any 
Q, R G Gi. 

From an admissible pairing e, decisional Diffie-Hellman (DDH) problem in Gi 
can be easily solved, since e{aP,bP) = e{P,abP) implies that (P,aP,bP,cP) is 
a valid Diffie-Hellman tuple. 

Definition 1. A prime order group G is a GDH group if there exists an efficient 
algorithm which solves the DDH problem in G and there is no polynomial time 
algorithm which solves the CDH problem. 

A GDH parameter generator XQ is a randomized algorithm that takes a secu- 
rity parameter fc G N, runs in time polynomial in k, and outputs the description 
of two groups Gi and G2 of the same prime order q and the description of an 
admissible pairing e : Gi x Gi — >■ G2. We say that IG satisfies the GDH as- 
sumption if the following probability is negligible (in k) for all PPT algorithm 

A: 

Pr[A(Gi, G2, e, P, aP, bP) = abP \ (Gi, G2, e) ^ XG{l’^),P ^ G]], a, & ^ ZJj. 

As noted in [8], a BDH parameter generator XGbdh [2] satisfying the BDH 
assumption can also be viewed as a GDH parameter generator IGqdh satisfying 
the GDH assumption because the BDH assumption is stronger than the GDH 
assumption. 

2.2 ID-Based Signature Scheme of Cha and Cheon 

Recently, Cha and Cheon [8] proposed an IBS scheme which is not only effi- 
cient but also provably secure in the random oracle model assuming that the 
underlying group is GDH. This scheme consists of the following algorithms: 

IBS. Setup: Choose a generator P of Gi, pick a random s G Z/qZ and set Ppub = 
sP. Choose hash functions Hi : {0, 1}* — >■ Gi and H^ : {0, 1}* x Gi — >■ ZjqZ. 
The system parameter is (Gi, G2, e, P, Ppub, Pi, H3) and the master secret 
key is s. 

IBS.Extr: Given an identity ID, compute the public key Qid = Pi(ID) and 
output the secret key Dm = sQid associated to ID. 

IBS. Sign: Given a secret key Dm and a message m, pick a random number 
r G Z/gZ and output a signature a = (P, V) where U = rQm, h = H^{m, U) 
and V = {r h)Dm- 
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IBS.Vrfy: To verify a signature cr = (U,V) of a message m for an identity ID, 
check whether {P, Ppub, U + hQiD,V), where h = H^{m, U) is a valid Diffie- 
Hellman tuple. 



2.3 BLS Signature and Multisignature Schemes 

Here, we introduce a pairing-based signature scheme of Boneh, Lynn and 
Shacham [4], and a multisignature scheme of Boldyreva [5]. Let Gi be a GDH 
group of prime order q and let P be a generator of Gi . The global information 
PARAMS contains P, q and a description of a hash function H mapping arbitrary 
strings to the elements of G^. A BLS signature scheme consists of the following 
algorithms: 

BLS. Key: Given a security parameter PARAMS, choose a random element x G Z* 
and return a pair {SK, PK) = (x, xP). 

BLS. Sign: Given a secret key SK and a message m G {0,1}*, compute H{m) 
and return a signature a = xH(m). 

BLS.Vrfy: To verify a signature cr of a message m, check whether 
(P, PK, H{m),a) is a valid Diffie-Hellman tuple. If valid then return 1, else 
return 0. 

Suppose n users each has a secret and public key pair {SKi, PKi) via running 
BLS. Key algorithm. For simplicity we are assigning the members consecutive 
integer identities 1, 2, . . . , n. Suppose user i signs a message m G (0, 1}* to ob- 
tain the signature Ui via BLS. Sign algorithm. The multisignature of an arbitrary 
subset of L C [n] is computed simply as ct = IlieL <^i & To verify the mul- 
tisignature cr on condition that public keys of all users in L are given, compute 
PKl = riieL check whether (P, PKl, H{m), a) is a valid Diffie-Hellman 

tuple. If valid, then return 1 else return 0. 

Both the BLS and induced multisignature scheme are proven to be secure in 
the random oracle model assuming that the underlying group Gi is GDH. 

2.4 The Model 

We now provide a formal definition of certificate-based signature schemes and 
their security. Our definition parallels the definition of a GBE scheme of Gentry. 
As stated in [11], it does not necessarily have to be “certificate updating”. Two 
main entities involved in GBS are a certifier and a user. This model does not 
require a secure channel between the two entities. 

Definition 2. A certificate-updating certificate-based signature scheme consists 
of the following algorithms: 

CBS.GeniBS, the IBS key generation algorithm, takes as input a security param- 
eter 1^1 and (optionally) the total number of time periods t. It returns SKc 
(the certifier’s master secret) and public parameters PARAMS that include a 
public key PKq, and the description of a string space S. 
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CBS.GenpKS, the PKS key generation algorithm, takes as input a security pa- 
rameter and (optionally) the total number of time periods t. It returns 
a secret key SK^ and public key PK^ (the user’s secret and public keys). 

CBS.Updl, the certifier update algorithm, takes as input SKc, params, i, string 
s G S and PKjj at the start of time period i. It returns Cert', which is sent 
to the user. 

CBS.Upd2, the user update algorithm, takes as input PARAMS, i, Cert' and 
(optionally) CERTi_i at the start of time period i. It returns Cert^. 

CBS. Sign, the signature generation algorithm, takes (m, params, Cert j, S’iCj/) 
as input in time period i. It computes the temporary signing key SK = 
f{SKu, Cert') where / is public algorithm, and outputs a signature a. 

CBS.Vrfy, the verification algorithm, takes {a,m,i, PKq, PKjj) as input and 
outputs a binary value 0 (invalid) or 1 (valid) . 

As the formal model for CBE, CBS is designed as a combination of PKS and 
IBS, where the signer need both its personal secret key and a certificate from 
the CA to sign. The string s includes a message that the certifier signs and may 
be changed depending on the scheme. 



Security. Roughly speaking, we are concerned with two different types of at- 
tacks by an uncertified user and by the certifier, as considered in CBE. We 
want CBS to be secure against each of these entities, even though each basi- 
cally has half of the secret information needed to sign. Accordingly, we define 
different two games and the adversary chooses one game to play. In Game 1, 
the adversary essentially assumes the role of an uncertified user. After prov- 
ing knowledge of the secret key corresponding to its claimed public key, it can 
make GBS.Sign and GBS.Updl queries. In Game 2, the adversary essentially 
assumes the role of the certifier. After proving knowledge of the master se- 
cret corresponding to its claimed params, it can make GBS.Sign queries. Let 
PID = {i, PKc, PKui UsiNFO) be a match for a user U's ID in IBC and call it 
by pseudo ID. 

Game 1: The challenger runs IBS.Gen(l*i , t), and gives params to the adversary. 
The adversary then issues GBS.Gert and GBS.Sign queries. These queries are 
answered as follows: 

• On certification query (PID,S'A[/), the challenger checks that SKu is 
the secret key corresponding to PKjj in PID. If so, it runs GBS.Updl 
and returns Cert'; else returns T. 

• On sign query (PID, S'A[/, m), the challenger checks that SKu is the 
secret key corresponding to PKu in PID. If so, it generates Cert^ and 
outputs a valid signature GBS.Sign(m, params, Cert*, SKu)] else it re- 
turns T. 

The adversary outputs (PID,m, cr), where PID = {i, PKc, PKa, Asinfo), 
m is a message and ct is a signature, such that PID and (PID,m) are not 
equal to the inputs of any query to GBS.Gert and GBS.Sign, respectively. The 
adversary wins the game if cr is a valid signature of m for i. 
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Game 2: The challenger runs CBS.GenpKs(l^^, t), and gives PKu to the adver- 
sary. The adversary then issues GBS.Sign query. 

• On GBS.Sign query (PID, S'iGc, params, m), the challenger checks that 
SKc is the secret key corresponding to PKq in params. If so, it generates 
CERTi and outputs a valid signature GBS.Sign(m, params, Cert^, SKjj); 
else returns T. 

The adversary outputs (PID, TO,cr), such that (PID,m) is not equal to the 
inputs of any query to GBS.Sign. The adversary wins the game if ct is a valid 
signature of m for i. 



Definition 3. A certificate-updating certificate-based signature scheme is se- 
cure against existential forgery under adaptively chosen message and pseudo ID 
attacks if no PPT adversary has non-negligible advantage in either Gamel or 
Game2. 

3 Concrete Certificate-Based Signature Schemes 

We describe two concrete certificate-based signature schemes called GBSm and 
GBSa. They use an IBS scheme in common, but as a temporary signing key, 
the former uses multisignatures and the latter does aggregate signatures^. Let k 
be the security parameter given to the setup algorithm, and let XQ be a GDH 
parameter generator. Both of them have the same setup and certificate update 
algorithm. 

GBS. Setup: The CA runs XQ on input k to generate groups Gi, G 2 of some 
prime order q and an admissible pairing e : Gi x Gi — >■ G 2 . Then picks 
an arbitrary generator P S Gi and a random secret sc G Z/gZ, and sets 
PKc = scP- Chooses cryptographic hash functions H\ : {0, 1}* — >■ Gi, and 
H3 : {0, 1}* X Gi ^ Z/gZ. 

The system parameters are params = {Gi,G 2 ,e, P, PKc, Hi, H 3 ) and the 
CA’s master secret key is SKc = sc & 'LjqL. The CA uses its parameters and 
its secret to issue certificates. And Alice computes a secret and public key pair 
as {SKa, PKa) = (sA, saP) according to the parameters issued by the CA. 

GBS.Gert: Alice obtains a certificate from his CA as follows. 

1. Alice sends Alicesinfo to the CA, which includes his public key saP 
and any necessary additional identifying information, such as his name. 

2. The CA verifies Alice’s information; 

3. If satisfied, the CA computes Pa = Hi{i, PKc, PKa, Alicesinfo) G Gi 
in period i. 

4. The CA then computes Cert^ = scPa and sends this certificate to 
Bob. 

^ We refer the reader to [3] for a discussion of multisignatures and aggregate signatures 
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In CBSm, before signing a message m € {0, 1}*, Alice signs Alicesinfo, 
producing sa^^a and then computes Sa = scPa + saPa = Cert^ + saPa, 
which is a two person multisignature. Alice will use this multisignature as his 
temporary signing key. 

CBSm. Sign: To sign m G {0, 1}* using Alicesinfo, picks a random r G Z/qZ 
and outputs a signature a = (U,V) where U = tPa, h = Hs{m,U) and 
V = (r + K)Sa = (r + h){sc + sa)Pa- 

CBSm.Vrfy: To verify a signature a = {U,V) of a message m, checks whether 
e{scP + saP, U + HPa) = e(P, V), where h = P[^{m, U). 

In CBSa, before signing a message m G {0, 1}*, Alice also signs Alicesinfo, 
producing saP'a where P'a = i?i (A licesinfo). And she computes her temporary 
signing key Sa = sc Pa + saP'a = Cert^ + saP'a, which is a two person 
aggregate signature. 

CBSa. Sign: To sign m G {0, 1}* using Alicesinfo, Alice does the following: 

1. Computes Pa = Hi{i, PKc, saP, Alicesinfo) G Gi. 

2. Picks a random r G Z/gZ and outputs a signature a = {Ui,U 2 ,V) 
where Ui = tPa, U 2 = tP'j^, h = Hs^m, U 2 ) and V = (r + h)SA = 
{r + h){scPA + SaP'a)- 

CBSa.Vrfy: To verify a signature cr = {Ui,U 2 ,V) of a message m, checks 
e{PKc, Ui + hPA) ■ €{PKa,U 2 + hP'^) = e(P, V), where h = Hsim, C/i, C/ 2 ). 

As a note, CBSm can be vulnerable to the following “chosen-key” attack [3] 
induced by multisignatures. If a malicious signer A given a secret and public key 
pair (s^, saP) sets PK'j^ = saP — sqP as its public key whose corresponding 
secret key it does not know, then A can generate a valid temporary signing key 
Sa = saPa{= Cert _4 -|- (s _4 — sc)Pa) by himself, without a valid certificate 
of period i. To prevent this attack, CBSm is required to have one assumption 
that the signer must provide a separate proof that it knows the secret key corre- 
sponding to its claimed public key and then the verifier must check this separate 
proof, but the verifier only has to do this once over the lifetime of the public 
key of the signer, so this extra verification cost may be amortized. For example, 
the notion of “long-lived certificate” in [11] can be used directly as a role of the 
separate proof. This auxiliary assumption prevents some malicious users from 
doing chosen-key attacks and, what is more, not a burden to the implementation 
of CBSm. 

On the other hand, CBSa uses the temporary signing keys obtained by ag- 
gregate signatures instead of multisignatures. It prevents the chosen- key attack 
even if we do not consider above assumption and is provably secure under the 
security notion in subsection 2.4, naturally. But the verification time is slower 
than that of CBSm (3 pairing computations instead of 2 are needed). 

Remark 1. As stated above, even though CBSm has an auxiliary assumption, it 
is quite acceptable since extra cost may be ignored. Furthermore, CBSm induces 
proxy signatures immediately and efficiently in conventional PKIs. So, we intend 
to focus on multisignature based CBS (i.e. CBSm) for the rest of this paper. 
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Remark 2. The IBS scheme of Hess [12] can be used in place of the scheme of 
Cha and Cheon, but the latter is rather efficient than the former in general case 

[ 7 ]. 

Security Proof. A pseudo ID is the input value of Hi to derive a certificate 
from the CA in period i. We modify the notion of security in subsection 2.4, 
which is acceptable for CBSm schemes. An adversary in Game 2 is allowed to 
make BLS.Sign queries. Independently, we need to prohibit a signature forged 
by the chosen-key attack from being accepted as a valid one. Thus it is required 
that a forged signature (PID,to,ct) is accepted as a valid one only when it 
comes with the secret key corresponding to the public key in PID. Without loss 
of generality, we say that a certificate-based signature scheme is secure against 
existential forgery under adaptively chosen message and pseudo ID attacks if no 
PPT adversary has a non-negligible advantage in the following one of games: 

Game 1: As the Game 1 in subsection 2.4 except the notion of forged signature 
validity. 

Game 2: Addition to the queries of the Game 2 in subsection 2.4, the adversary 
is allowed to issue BLS.Sign query. This query is answered as follows: 

• On BLS sign query (PID, SKq), the challenger checks that SKq is the 
secret key corresponding to the public key PKc in params. If so, it 
returns BLS.Sign(S'A[/, PID); else returns T. 

The adversary outputs (PID,m,CT) and SKc- It is valid when PID and 
(PID, m) are not equal to the inputs of any query to BLS.Sign and GBS.Sign 
respectively, and SKc is the secret key corresponding to the public key in 
PID. The adversary wins the game if ct is a valid signature of m for PID. 

For notational purposes, the result of the GBS.Sign query will be denoted by 
(PID, m, t/, ft-, P) where (U,V) is the output of the signing algorithm of our 
scheme and ft = ifti(m, U). 

Theorem 1. Our certificate-hased signature scheme is secure against existential 
forgery under adaptively chosen message and pseudo ID attacks assuming the 
underlying group is GDH. 

The proof of the above theorem is in the full version of this paper [14]. 

Theorem 2. An aggregate signature based CBS scheme (CBSa) is also secure 
against existential forgery under adaptively chosen message and pseudo ID at- 
tacks. 

As stated above, this theorem can be proved exactly under the security notion 
in subsection 2.4. The proof of the above theorem is in the full version of this 
paper [14]. 

Remark 3. As a certificate-based encryption scheme, adapting CBS to a hierar- 
chy of CAs is fairly obvious. In this case, we need to use aggregate signatures 
instead of multisignatures because of certification of CAs. 
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4 Proxy Signature Schemes 

Next, we show an application of CBS to proxy signatures. The concept of proxy 
signatures was first introduced by Mano, Usuda and Okamoto in 1996. A proxy 
signature scheme which consists of an original signer, a proxy signer and verifiers, 
allows the original signer to delegate its signing capability to the proxy signer, 
to sign messages on its behalf. From a proxy signature, anyone can check both 
the original signer’s delegation and the proxy signer’s digital signature. 

Recently, Boldyreva, Palacio and Warinschi [6] formalize a notion of security 
for proxy signatures and show that secure proxy signature schemes can be derived 
from secure standard signature schemes. But they focus on the case that one 
digital signature scheme is used for standard signing, proxy designation and 
proxy signing, simultaneously. 



4.1 Definition and Security Notion of Proxy Signature Schemes 

The basic idea to implement a secure proxy signature scheme is that the original 
signer creates a signature on the delegation information (warrant^) and then 
the proxy signer uses it to generate a proxy secret key and signs on the dele- 
gated message. Since the proxy key pair is generated using the original signer’s 
signature on delegation information, any verifier can check the original signer’s 
agreement from the proxy signature. For simplicity, let users be identified by 
natural numbers, PKi denote the public key of user f G N, and SKi denote the 
corresponding secret key. Then a proxy signature scheme consists of eight algo- 
rithms. Three algorithms PS. Key, S. Sign and S.Vrfy are as in ordinary signature 
schemes. The other five algorithms provide the proxy signature capability. 

(PS. Del, PS. Pro), a pair of interactive algorithms forming the proxy designation 
protocol, takes as input {PKi, PKj) for the original signer i and the proxy 
signer j in common. Each PS. Del and PS. Pro also takes as input SKi and 
SKj, respectively. As result of the interaction, PS. Pro outputs a proxy signing 
key SKP. 

PS. Sign, the proxy signature generation algorithm, takes as input {m,SKP), 
and outputs a proxy signature pa. 

PS.Vrfy, the proxy verification algorithm, takes as input {pa,m, PKi), and out- 
puts 0 (invalid) or 1 (valid). 

PS.Iden, the proxy identification algorithm, takes as input pa, and outputs PKj. 

For all messages m and all users i,j G N, if SKP is a proxy signing key for 
user j on behalf of user i, then PS.Vrfy(PS.Sign(m, S'AP), m, PAi) = 1 and 
PS.Iden(PS.Sign(m, S'AP)) = PKj. 

Chosen message attack capabilities are formed by providing the adversary 
access to two oracles: a standard signing oracle and a proxy signing oracle. The 
first oracle takes input a message m, and returns a standard signature for m 

^ A warrant is a message containing the public key of the designated proxy signer and 
possibly restrictions on the messages the proxy signer is allowed to sign. 
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by user 1. The second oracle takes input a tuple and if user 1 was 

designated by user i at least I times, returns a proxy signature for m created 
by user 1 on behalf of user i, using the ^-th proxy signing key. The goal of the 
adversary is to produce one of the following forgeries: 

1 . a standard signature by user 1 for a message that was not submitted to the 
standard signing oracle. 

2. a proxy signature for a message m by user 1 on behalf of some user i such 
that either user i never designated user 1 or m was not in a query 

made to the proxy signing oracle, or 

3. a proxy signature for a message m by some user i on behalf of user 1, such 
that user i was never designated by user 1. 

We refer the reader to [6] for the notion of security for delegation-by- 
certificate proxy signatures. 

4.2 A Concrete Proxy Signature Scheme 

We construct a delegation-by-certificate proxy signature scheme which is derived 
from CBS. Contrary to the examples in [6], we use the BLS signature scheme 
for standard signing different from the proxy signing. After all, our proxy sig- 
nature scheme employs the BLS signature scheme for standard signing and for 
delegation, and allows an IBS scheme for proxy signing. 

Assume that there are two participants, Charlie and Alice with secret and 
public key pairs (sc,scP) and (sa,saP) respectively, and that they have the 
common system parameters params = (Gi, G 2 , e, P, 

S.Sign: A standard signature for message m is obtained by signing the result 
using BLS. Sign. 

S.Vrfy: The verification of a signature cr for a message m is done by computing 

BLS.Vrfy. 

(PS. Del, PS. Pro): In order to designate Alice as a proxy signer, Charlie simply 
sends to Alice an appropriate warrant w together with a signature Cert^i = 
scPa, where Pa = Hi{PKc, PKa, w). The corresponding proxy signing key 
of Alice is SKPa = Cert^ -I- saPa- 

PS. Sign: A proxy signature for message m produced by Alice on behalf of Chalie, 
contains a warrant w, the public key of the proxy signer PKa, and signature 
a = (U,V) where U = tPa, h = Hz{m,U) and V = (r + h)SKPA = 
(r -I- h){sc + sa)Pa- 

PS.Vrfy: To verify a signature {PKc, m, {PKa, w, a)), checks whether e{PKc + 
PKa, U + HPa) = e{P, V), where h = H^{m, U). 

PS.Iden: The identification algorithm is defined as PS.Iden(P7Ci, w, cr) = PKa- 

In our proxy signature scheme, the role of the CA in CBS schemes is trans- 
formed to the original signer, so trust of certificate provider may be removed. 
And the signer’s information to be signed by the CA for certification in CBS 
schemes is issued conversely from the original signer as a warrant for delegation. 
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Due to a merit of CBS, our proxy scheme does not need to include a signature 
for the warrant under the secret key of the original signer in the proxy signature. 
And it does not require a secure channel for proxy designation [15]. The follow- 
ing theorem shows that our proxy scheme is secure under the security notion of 
[ 6 ]. 

Theorem 3. The scheme defined above is a secure proxy signature scheme in 
the random oracle model assuming that the underlying group is GDH. 

The proof of the above theorem is in the full version of this paper [14]. 

Remark 4- Recently, the proxy signature scheme using the same idea is proposed 
by Zhang, Safavi-Naini and Lin [19]. It is based on the IBS scheme of Hess [12], 
and uses the BLS signature scheme for standard signature and for certification 
of warrant. Though their scheme also holds desirable and implicit security con- 
ditions, it does not guarantee a provable security. We make sure that our work 
bridges this gap. 

5 Conclusion 

In this paper, we defined the security notion of certificate-based signature us- 
ing the same parameters and certificate revocation strategy as the encryption 
scheme by Gentry. We presented and compared two concrete CBS schemes, and 
provided proofs of security in the random oracle model assuming that the un- 
derlying group is GDH. Our scheme may be useful to construct an efficient 
PKI combining the CBE scheme of Gentry. Additionally, we derived a concrete 
delegation-by-certificate proxy signature scheme from a certificate-based signa- 
ture scheme through simple modifications and proved its security under the 
security notion defined by Boldyreva, Palacio and Warinschi. 
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Abstract. In this paper, we give a first example of identity based 
undeniable signature using pairings over elliptic curves. We extend 
to the identity based setting the security model for the notions of 
invisibility and anonymity given by Galbraith and Mao in 2003 and we 
prove that our scheme is existentially unforgeable under the Bilinear 
Difhe-Hellman assumption in the random oracle model. We also prove 
that it has the invisibility property under the Decisional Bilinear 
Difhe-Hellman assumption and we discuss about the efficiency of the 
scheme. 

Keywords. ID-based cryptography, undeniable signatures, pairings, 
provable security. 



1 Introduction 

Identity based public key cryptography is a paradigm proposed by Shamir in 
1984 ([37]) to simplify key management and remove the necessity of public key 
certificates. To achieve this, the trick is to let the user’s public key be an in- 
formation identifying him in a non ambiguous way (e-mail address, IP address, 
social security number...). The removal of certificates allows avoiding the trust 
problems encountered in today’s public key infrastructures (PKIs). This kind of 
cryptosystem involves trusted authorities called private key generators (PKGs) 
that have to deliver private keys to users after having computed them from their 
identity information (users do not generate their key pairs themselves) and from 
a master secret key. End-users do not have to enquire for a certificate for their 
public key. Although certificates are not completely removed (the PKG’s public 
key still has to be certified since it is involved in each encryption or signature 
verification operation), their use is drastically reduced since many users depend 
on the same authority. Several practical identity based signature schemes (IBS) 
have appeared since 1984 ([23], [17], [36]) but a practical identity based encryp- 
tion scheme (IBE) was only found in 2001 ([4]) by Boneh and Franklin who took 
advantage of the properties of suitable bilinear maps (the Weil or Tate pairing) 
over supersingular elliptic curves. Many other identity based primitives based 
on pairings were proposed after 2001: digital signatures, authenticated key ex- 
change, non-interactive key agreement, blind and ring signatures, signcryption, 
... ([7], [9], [15], [25], [32], [38], [39], ... ). 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 112-125, 2004. 
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Undeniable signatures are a concept introduced by Chaum and van Antwer- 
pen in 1989 ([10]). It is a kind of signatures that cannot be verified without 
interacting with the signer. They are useful in situations where the validity of 
a signature must not be universally verifiable. For example, a software vendor 
might want to embed signatures into his products and allow only paying cus- 
tomers to check the authenticity of these products. If the vendor actually signed 
a message, he must be able to convince the customer of this fact using a confir- 
mation protocol and, if he did not, he must also be able to convince the customer 
that he is not the signer with a denial protocol. These proofs have to be non- 
transferable: once a verifier is convinced that the vendor did or did not sign a 
message, he should be unable to transmit this conviction to a third party. 

In some applications, a signer needs to decide not only when but also by 
whom his signatures can be verified. For example a voting center can give a 
voter a proof that his vote was counted without letting him the opportunity to 
convince someone else of his vote. That is the motivation of designated verifier 
proofs for undeniable signatures. This kind of proof involves the verifier’s public 
key in such a way that he is not able to convince a third party that a signer 
actually signed a message or not because he is able to produce such a valid proof 
himself using his private key. Several proof systems were proposed for undeni- 
able signatures ([18], [26], [33],...). The use of non-transferable designated verifier 
proofs ([26]) can provide non-interactive confirmation and denial protocols. 

Several examples of undeniable signature schemes based on discrete loga- 
rithm were proposed ([10], [11], [12]) and the original construction of Chaum and 
van Antwerpen ([10]) was proven secure in 2001 by Okamoto and Pointcheval 
([31]) thanks to new kind computational assumptions. Several convertible ^ un- 
deniable signatures were proposed ([6], [16], [29],...). RSA-based undeniable sig- 
natures were designed by Gennaro, Krawczyk and Rabin ([21]) and Galbraith, 
Mao and Paterson ([19]). However, no secure identity based undeniable signature 
has been proposed so far. A solution was proposed in [24] but it was shown in 
[40] to be insecure. In this paper, we show how bilinear maps over elliptic curves 
can provide such a provably secure scheme. It is known ([30]) that an undeniable 
signature can be built from any public key encryption scheme and a similar re- 
sult is likely to hold in the ID-based setting. However, the scheme described here 
can offer a security that is more tightly related to some computational problem 
than a scheme derived from the Boneh-Franklin IBE ([4]). 

Chaum, van Heijst and Pfitzmann introduced the notion of ’invisibility’ for 
undeniable signatures. Intuitively, it corresponds to the inability for a distin- 
guisher to decide whether a message-signature pair is valid for a given user or 
not. The RSA-based schemes described in [19] and [21] do not provide invisibil- 
ity. In [20] , Galbraith and Mao describe a new RSA-based undeniable signature 
that provides invisibility under the so-called composite decision Diffie-Hellman 



^ See [6]. Convertible undeniable signatures are undeniable signatures that can be 
converted by the signer into universally verihable signatures. 
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assumption and they show that invisibility and anonymity ^ are essentially equiv- 
alent security notions for undeniable signature schemes satisfying some particular 
conditions. In this paper, we extend these two security notions to the identity 
based setting and we prove in the random oracle model that our scheme is both 
existentially unforgeable and invisible under some reasonable computational as- 
sumptions. Invisibility and anonymity can also be shown to be equivalent in the 
context of identity based cryptography and we will not do it here. 

In section 2, we first recall the properties of pairings over elliptic curves be- 
fore formally describing security notions related to identity based undeniable 
signatures. In section 3, we describe the different components of our scheme. We 
then show their correctness and we discuss about their efficiency. The rest of the 
paper is made of a security analysis of the scheme in the random oracle model. 

2 Preliminaries 

2.1 Overview of Pairings and Bilinear Problems 

Let us consider groups Gi and G 2 of the same prime order q. We need a bilinear 
map e : Gi X Gi — >■ G 2 satisfying the following properties: 

1. Bilinearity: V P, Q G Gi, V a, 6 G Z^, we have e(aP, bQ) = e(P, Q)“^. 

2. Non-degeneracy: for any P G Gi, e(P, Q) = 1 for all Q G Gi iff P = O. 

3. Computability: some efficient algorithm can compute e(P, Q) V P, Q G Gi. 

Typical admissible bilinear maps are obtained from a modification of the Weil 
pairing (see [4]) or from the Tate pairing (the original Weil pairing is defined 
over a non-cyclic group, see [4] for details) . The security of the schemes described 
in this paper relies on the hardness of the following problems. 

Definition 1. Given groups Gi and G 2 of prime order q, a bilinear map e : 
Gi X Gi — >■ G 2 and a generator P o/Gi, 

- the Bilinear Diffie- Heilman problem (BDH) in (Gi,G 2 ,e^ is to compute 
e(P, P)®**® given (P, aP, bP, cP) . 

- The Decisional Bilinear Diffie- Heilman problem (DBDH) is, given 
(P,aP,bP,cP) and z G G 2 , to decide whether z = e(P, P)“^'^ or not. The 
advantage of a distinguisher V for the DBDH problem is defined as 

Adv{V) = [1 ^ V{aP, bP, cP, h)] 

- [1 ^ T^iaP, bP, cP, e(P, P)“*'=)] I . 

- The Gap Bilinear Dijfie Heilman problem is to solve a given instance 
(P, aP, bP, cP) of the BDH problem with the help of a DBDH oracle that is 
able to decide whether a tuple (P, a'P, b'P, c'P, z) is such that z = e(P, p)“ ^ 
or not. Such tuples will be called DBDH tuples. 

^ This security notion is related to the inability for an adversary to decide which user 
generated a particular message-signature pair in a multi-user setting. 
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The DBDH problem was introduced in [13] where it is shown to be not harder 
than the decisional Diffie-Hellman problem in G 2 . It is not known whether the 
Bilinear Difhe-Hellman problem is strictly easier than the computational Diffie- 
Hellman problem in Gi or not. It is also an open question whether the DBDH 
problem is strictly easier than the BDH one (although it is obviously not harder). 
Nevertheless, no probabilistic polynomial time algorithm is known to solve any 
of them with a non-negligible advantage so far. 

2.2 Identity Based Undeniable Signatures 

An identity based undeniable signature is made of three algorithms and two 
possibly interactive protocols. 

Setup: the PKG takes as input a security parameter k and produces a pub- 
lic/private key pair (s,Ppub) and the system’s public parameters params. s 
is the system’s master key and Ppub is the PKG’s public key that must be 
certified. 

Keygen: given a user’s identity ID, the PKG uses its master secret key s 
to compute the corresponding private key dm and transmit it to the user 
through a secure channel. 

Sign: given a message M G {0, 1}* and his private key djD, the user generates 
a signature a associated to M for his identity ID. 

Confirm: is a protocol between a signer and a (possibly designated) verifier 
that takes as input a message M G {0, 1}*, an identity ID G {0, 1}*, the 
associated private key djo and a valid signature cr for the pair (M, ID). The 
output of the protocol is a (possibly non-interactive) non-transferable proof 
that a is actually a valid signature on M for the identity ID. 

Deny: takes as input an invalid signature a for a given pair (M,ID) and the 
private key djD corresponding to ID. Its output is a proof that a is not a 
valid signature for a message M and an identity ID. 

Gonfirm and Deny may be a single protocol. In our scheme, they are distinct. 

2.3 Security Notions for Identity Based Undeniable Signatures 

The first security notion for ID-based undeniable signature is close to the one for 
other existing identity based signatures: it is the notion of existential unforge- 
ability under chosen-message attacks. 

Definition 2. An identity based undeniable signature scheme is said to be ex- 
istentially unforgeable under chosen-message attacks if no probabilistic poly- 
nomial time (PPT) adversary has a non-negligible advantage in this game: 

1. The challenger runs the setup algorithm to generate the system’s parameters 
and sends them to the adversary. 

2. The adversary T performs a series of queries: 

- Key extraction queries: T produces an identity ID and receives the pri- 
vate key dm corresponding to ID. 
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- Signature queries: T produces a message M and an identity ID and 
receives a signature on M that was generated by the signature oracle 
using the private key corresponding to the public key ID. 

- Confirmation/ denial queries: T produces a pair message-signature 
(M,a) and an identity ID and gives them to the signature oracle that 
runs the confirmation/denial protocol to convince T that a is actually 
related to M and ID or that it is not (in a non-transferable way) using 
the private key dm corresponding to ID. 

3. After a polynomial number of queries, T produces a tuple {ID,M,a) made 
of an identity ID, whose corresponding private key was not asked to the 
challenger during stage 2, and a message-signature pair {M, a) that was not 
issued by the signature oracle during stage 2 for the identity ID. 

The forger T wins the game if it is able to provide a non-transferable proof 
of validity of the signature a for message M and identity ID. Its advantage 
is defined to be its probability of success taken over the coin-flippings of the 
challenger and T . 

A second security notion, introduced by Chaum, van Heijst and Pfitzmann 
([12]), is called ’invisibility’. Informally, this notion corresponds to the inability 
for a dishonest verifier to decide whether a given signature on a given message 
was issued by some signer even after having observed several executions of confir- 
mation/denial protocols by the same signer for other signatures. Galbraith and 
Mao ([20]) proposed a general definition for this security notion. In the identity 
based setting, we need to strengthen it a little to consider the fact that a dis- 
honest user might be in possession of private keys associated to other identities 
before trying to validate or invalidate an alleged signature on a message for an 
identity without the help of the alleged signer. 

Definition 3. An identity based undeniable signature scheme is said to satisfy 
the invisibility property if no PPT distinguisher T> has a non-negligible advan- 
tage against a challenger in the following game: 

1. The challenger performs the setup of the scheme and sends the system’s 
public parameters to T>. 

2. The distinguisher D performs a polynomially bounded number of queries: key 
extraction queries, signature queries and confirmation/denial queries of the 
same kind as those of the previous definition. 

3. After a first series of queries, D asks for a challenge: it produces a pair 
{M,ID) made of a message and an identity for which the associated private 
key was not asked in step 2. The challenger then flips a coin b {0, 1}. If 
b = 0, the challenger sends D a valid signature a on M for the identity ID. 
Otherwise, D receives from the challenger a random element a S taken 
at random from the signature space S. 

4 . The distinguisher D then performs a second series of queries. This time, 
it is not allowed to perform a confirmation/denial query for the challenge 
(a, M, ID) nor to ask the private key associated to ID. 
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5. At the end of the game, T> outputs a hit b' (that is 0 ifT> finds that (a, M, ID) 
is a valid message-signature-identity tuple and 1 otherwise) and wins the 
game if b = b' . 

V’s advantage in this game is defined to be Adv™^(D) := 2Pr[b = b'] — 1. where 
the probability is taken over the coin flippings of the distinguisher T> and the 
challenger. 

Similarly to what is done in [20], we also consider the notion of anonymity. This 
notion is slightly strengthened in the identity based setting 

Definition 4. We say that an identity based undeniable signature scheme sat- 
isfies the anonymity property if no probabilistic polynomial time distinguiser T> 
has a non-negligible advantage in the following game: 

1. The challenger performs the setup of the scheme and sends the system’s 
public parameters to T>. 

2. The distinguisher T> performs a polynomially bounded number of queries: key 
extraction queries, signature queries and confirmation/denial queries of the 
same kind as those of definition 2. 

3. After a first series of queries, T> requests a challenge: it produces a message 
M and a pair of identities IDq,IDi for which it did not ask the associated 
private keys in the first stage. The challenger then flips a coin b {0, 1} 
and computes the signature a on M with the private key associated to IDi,. 
a is sent as a challenge to T>. 

4-. T> performs another series of queries. This time, it is not allowed to perform 
a confirmation or denial query for the challenge a on identities IDq, ID\ 
nor to request the private key associated to these identities. 

At the end of the game, T> outputs a bit b' for which it finds that u is a valid 
signature on M for the identity IDbi. It wins the game if b' = b. Its advantage 
is defined as the previous definition. 

It is shown in [20] that the notions of invisibility and anonymity are essen- 
tially equivalent for undeniable and confirmer signature schemes satisfying some 
particular properties. It is straightforward (by using the techniques of [20]) to 
show that this equivalence also holds in the identity based setting. In the next 
section, we describe an example of identity based undeniable signature and we 
show its existential unforgeability and its invisibility in the random oracle model. 



3 An Identity Based Undeniable Signatnre 

Our ID-based undeniable signature scheme is made of the following algorithms. 

Setup: given security parameters k and I, the PKG chooses groups Gi and G 2 
of prime order q > 2^, a generator P of Gi, a bilinear map e : Gi x Gi — >■ G2 
and hash functions Hi : {0, 1}* — >■ Gi, i?2 : {0, 1}* x {0, 1}* x {0, 1}* — >■ Gi, 




118 B. Libert and J.-J. Quisquater 



Hs : 63^ — >■ Zq and H4 : 62"* — >■ Zg. It chooses a master secret s Z^ 
and computes Ppub = sP € Gi that is made public. The system’s public 
parameters are 

params := {q, Gi, G2, e, P, Ppnb, Pi, P2, P3, ^4}- 

Keygen: given an identity IP, the PKG computes Q/n = Hi{ID) G Gi and 
the associated private key dm = sQid G Gi that is transmitted to the user. 

Sign: to sign a message M G { 0 , 1 }*, the signer Alice uses the private key 
associated to her identity IDa- 

1 . She picks a random string r { 0 , 1 }* to compute H2{M, r, IDa) G Gi. 

2 . She then computes 7 = e{H2{M,r,IDA),diDA) G G2. The signature on 
M is given by 

a = (r,j) = (r,e(H2(M,r,IDA),dm^)) G { 0 , 1 }* x G2. 

Confirm: to verify a signature cr on a message M, a verifier of identity I Dr 
needs the help of the signer Alice. He sends her the pair (M, a) , where 
CT =< r, 7 >G { 0 , 1 }* XG2 is the alleged signature. Alice then runs the 
following confirmation protocol to produce a non-interactive designated- 
verifier proof that ct is a valid signature on M for her identity IDa' 

a. She first computes Qidb = Hi{IDb)- 

b. She picks U,R ^r Gi and v <—RZq and computes 

c=e{P,U)e{Ppub,QiDB)" (^<^2 

gi = e{P,R) G G2 and 52 = KH2{M,r, IDa), R) G G2. 

c. She takes the hash value h = H^{c,gi,g2) G Zg. 

d. She computes S = R + (h + v)d/R^ . 

The proof is made of {U,v, h, S). To check its validity, the verifier first 
computes c' = e{P,U)e{Ppub,QiDB)'’ , 9i = e(P, S')e(Pp„h, and 

g'2 = e{H2{M,r,IDA),S)^^^'" and accepts if and only iih = H^^c' , g[, g2)- 

Deny: in order to convince a designated verifier of identity I Dr that a given 
signature cr =< r, 7 > on a message M is not valid for her identity IDa, 

a. Alice computes Qidb = Hi{IDr) G Gi and picks random U ^r Gi, 
V ^R Zg to compute c = e(P, U)e{Ppub, QiDb)'’ ■ 

b. She computes a commitment C = for a randomly 

chosen lo ^RZg. 

c. She proves in a zero-knowledge way that she knows a pair (i?, a) G 
Gi X Zq such that 

^_ e{H2{M,r,IDA),R) ^P,R) 

7“ e{Ppub,QiDA)°‘ 



( 1 ) 




Identity Based Undeniable Signatures 119 



To do this, 

1. She picks V Gi, v Zg to compute 

Pi = e{H2{M,r,IDA),V)^~^ G G2 and p2 = e{P,V)y~'" G G2 

where y = e{Ppub, QiDa)- 

2. She computes h = Hi{C, c, pi, P2) G Zg. 

3. She computes S = V + (h + v)R G Gi and s = v + {h + v)a G Zg. 

The proof is made of {C,U,v,h, S, s). It can be verified by the verifier 
of identity IDb who rejects the proof if C = 1 and otherwise computes 
c' = e{P,U)e{Ppub.QiDsY, P'l = KH2{M,rJDA),Sh-^C-(’^+^^ and 
p'2 = e{P, S)y~^ where y = e{Ppub, QiDa)- The verifier accepts the proof 
if and only if h = Hi{C, c', p^)- 

The confirmation protocol is inspired from a designated verifier proof ([26]) pro- 
posed by Jakobsson, Sako and Impagliazzo that allows a prover to convince a 
designated verifier of the equality of two discrete logarithms. The denial pro- 
tocol is an adaptation of a protocol proposed by Camenisch and Shoup ([8]) 
to prove the inequality of two discrete logarithms. Both adaptations are non- 
transferable proofs of respectively equality and inequality of two inverses of the 
group isomorphisms /g : Gi — >■ G2,<5 — >■ /q(G) = e{Q,U) with Q = P and 
Q = Pl2{M, r, ID a)- In an execution of the confirmation protocol, the verifier B 
takes the signature as valid if he is convinced that fp{diDA) = ^{Ppub, QiDa) 

7 have equal pre-images for isomorphisms /p(.) = e(P, .) and fH2(M,rJDA)i-) ~ 
e{H2{M,r,IDA), ■)■ In the denial protocol, he takes the signature as invalid on 
M for identity IDa if he is convinced that these inverses differ. 



Completeness and soundness of the confirmation protocol: it is easy 
to see that a correct proof is always accepted by the verifier B: if {U,v,h,S) is 
correctly computed by the prover, we have e{P,S) = e{P, R)e{P,diD^)^^'" and 
e{P,diDA) = e{Ppub,QiDA)- We also have 

e{H2{M, r, IDa),S) = e{H2{M, r, IDa), S)e{H2{M, r, /P^), 

To show the soundness, we notice that if a prover is able to provide two cor- 
rect answers 5'i,S'2 for the same commitment (0,31,32) and two different chal- 
lenges hi and /12, we then have e(P, {hi — h2)~^{Si — S2)) = e{Ppub, QiDa) 
e{H2{M,r,IDA),{hi — /i2)~^(«5'i — 5*2)) = 7. This shows that both inverses of 
fp {^{Ppub , Q I D a)) and f B^(^M,r,iDA)^'^^ equal. 



Completeness and soundness of the denial protocol: one easily checks 
that a honest prover is always accepted by the designated verifier. To prove the 
soundness, one notices that if the prover is able to provide a proof of knowledge 
of a pair (R,a) satisfying equations (1), then the second of these equations 
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implies R = afp^{y) with y = e{Ppub,QiDA) by the bilinearity of the map. If 
we substitute this relation into the first equation of ( 1 ), it comes that 

^ . e{H2{M,r,IDA)JpHy)) y 
7 

Since the verifier checks that C yf 1, it comes that e{H 2 {M,r, IDa), fp^{y)) yf 7 
and the signature 7 is actually invalid. The soundness of the proof of knowledge 
in step c is easy to verify. 

Non-transferability: in order for the proofs to be non-transferable, both 
protocols need a trapdoor commitment Comm.±t(JJ,v) = e{P,U)e{PpubiQiDBY 
that allows the owner of the private key djus fo compute commitment col- 
lisions: indeed, given a tuple {U,v, Commit (U,v)), B can easily use to 

find a pair {U',v') such that Commit([/, w) = Commit({7', w'). This is essential 
for the proof to be non-transferable: the verifier B cannot convince a third 
party of the validity or of the invalidity of a signature since his knowledge 
of the private key allows him to produce such a proof himself. Indeed, 

given a message-signature pair (M,a), with a =< r, 7 >G {0,1}* x G 2 , B 
can choose S Gi, x Zg and U' ^r Gi to compute c = e{P,U'), 
gi = e{P,S)e{Ppub,QiDAY, 92 = e{H 2 {M,r, IDa), and c = i 73 (c, gi, 32 )- 
He can then compute v = x — h mod q and U = U' — vdiDs G where dios is 
the verifier’s private key. ([/, v, h, S) is thus a valid proof built by the verifier with 
the trapdoor . This trapdoor also allows him to produce a false proof of a 
given signature’s invalidity using the same technique with the denial protocol. 

Efficiency considerations: From an efficiency point of view, the signature 
generation algorithm requires one pairing evaluation as a most expensive oper- 
ation. The confirmation and denial protocols are more expensive: the first one 
requires 4 pairing evaluations (3 if e{Ppub, Qidb) cached in memory: this can 
be done if the verifier often performs verification queries), one exponentiation 
in G 2 and one computation of the type \\P + X 2 Q in Gi. The verifier needs 
to compute 3 pairings (2 if e{Ppub,QiDA) i® cached), 3 exponentiations and 3 
multiplications in G 2 . In the denial protocol, the prover must compute 5 pairings 
(4 if e{Ppub,QiDB) is cached), 4 exponentiations and 4 multiplications in G 2 , 
one computation of the type AiP -I- X 2 Q and some extra arithmetic operations 
in Zq. The verifier must compute 4 pairings (3 if e{Ppub, QiDb) is cached), 2 ex- 
ponentiations, 1 multi-exponentiation and 3 multiplications in G 2 . To improve 
the efficiency of the confirmation and denial algorithms, one can speed up the 
computation of commitments. Indeed, the prover can pre-compute e{P,P) once 
and for all. To generate a commitment in an execution of the confirmation pro- 
tocol, he then picks u,v,x ^r Zg and computes c = e(P, P)“e(Pp„{,, 

R = xP gi = e{P,PY, g 2 = e{H 2 {M,r, IDa), R)- The answer to the challenge 
h must then be computed as S' = P -I- (ft. -I- v)djDA the proof is made of 
(u,v,h,S). This technique can also be applied to the denial protocol. It allows 
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replacing 2 pairing evaluations by 2 scalar multiplications, a exponentiation and 
a multi-exponentiation in G 2 (to compute c) . A single pairing evaluation is then 
required for the prover at each run of the confirmation and denial protocols. 

Globally, it turns out that a signature verification is more expensive than 
a signature generation even if a pre-computation is performed. Our ID-based 
undeniable signature solution is nevertheless reasonable. 

If we consider the length of signatures, the binary representation of a pair- 
ing is about 1000 bits long (1024 if we use the same curve as in [4]) while the 
length I of the binary string can be of the order of 100 bits. This provides us 
with signatures of about 1100 bits. This is roughly one half of the size of the 
RSA-based undeniable signature proposed in [20] for 1024-bit moduli. If we com- 
pare our scheme with the original undeniable signature proposed by Chaum and 
van Heijst and proven secure by Okamoto and Pointcheval ([31]), both lengths 
are similar if the Chaum-van Heijst scheme is used over a group like Zp with 
\p\ = 1000 (this is no longer true if this scheme is used over a suitable elliptic 
curve). However, it remains an open problem to devise identity based undeniable 
signature schemes with shorter signatures than ours. 



Convertible signatures: It is really easy to notice that issued signatures can 
be selectively turned into universally verifiable signatures by the signer. In order 
to convert a genuine signature a =< r, e(H 2 (Af, r, /D^), d/D^) >, the signer 
Alice just has to take a random x Zq and compute R = xP, g\ = e{P, PY , 
g2 = e{P[2{M,r,IDA), R), the hash value h = H{gi,g2) and the answer S = 
R+ hd[D^. The proof, given by (d, S) G x Gi, is easily universally verifiable 
by a method similar to the verification in the confirmation protocol. Alice can 
also give a universally verifiable proof that a given signature is invalid for her 
identity by using the non-designated verifier counterpart of the denial protocol. 



Removing key escrow: In order to prevent a dishonest PKG from issuing a sig- 
nature on behalf of a user and from compromising the invisibility and anonymity 
properties, one can easily use the generic transformation proposed by Al-Riyami 
and Paterson ([1]) to turn the scheme into a certificateless undeniable signature. 
Unfortunately, the advantage of easy key management is lost since the resulting 
scheme no longer supports human-memorizable public keys. On the other hand, 
key escrow, which is often an undesirable feature in signature schemes, is then 
removed as well as the need for public key certificates. 

4 Security Proofs for the ID-Based Undeniable Signature 

The following theorem claims the scheme’s existential unforgeability under adap- 
tive chosen-message attacks. 

Theorem 1 . In the random oracle model, if there exists an adversary T that 
is able to succeed in an existential forgery against the identity based undeniable 
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signature scheme described in the previous section with an advantage e within a 
time t and when performing qe key extraction queries, qs signature queries, qcD 
confirmation/ denial queries and qn, queries on hash oracles Hi, for i = 1, .. . ,4, 
then there exists an algorithm B that is able to solve the Bilinear Diffie- Heilman 
problem with an advantage 



e' > 



e ~ (2gg3 + qcD + l)/2^ 
+ 1)(9CD + 1) 



where Tp 



in a time t' < t + &Tp + {qE + qHi + qH2)Tm + {qs + qcD)Te + 2qcD% 
denotes the time required for a pairing evaluation, %n is the time to perform a 
multiplication in Gi, Te is the time to perform an exponentiation in G 2 , %ne the 
time for a multi- exponentiation in G 2 and e is the base for the natural logarithm. 

Proof, given in the full paper ([28]). □ 

In order for the proof to hold, we must have qn^ ^ 2*, where I is the size 
of the random salt r. We then take I = 100. We note that the reduction is 
not really efficient: if we take qE ~ qcD < 2^° and qn^ < we then have 
(2g//3+gcD + l)2“'" « 1.65x 10“®"^. If we assume e-(2g//3 + gcn + l)2“^ > e/2, 
we obtain the bound e' > e/2®^. However, we have a proof with a tighter bound 
if the underlying assumption is the hardness of the Gap Bilinear Diffie-Hellman 
problem. The use of a DBDH oracle allows algorithm B to perfectly simulate the 
confirmation/denial protocols. The advantage of algorithm B is then 



e' > 



— (2(Z_f/3 + qcD + 1)2 
e{qE + 1 ) 



— k 



> e/2' 



32 



We note that using the techniques of Katz and Wang ([27]) easily allows 
replacing the random salt r by a single bit and then obtaining signatures that 
are about 100 bits shorter without losing security guarantees. 

The theorem below claims the scheme’s invisibility in the sense of Galbraith 
and Mao (see [20]) under the Decisional Bilinear Diffie-Hellman assumption. 

Theorem 2. In the random oracle model, the identity based undeniable signa- 
ture presented in section 3 satisfies the invisibility property provided the De- 
cisional Bilinear Diffie-Hellman problem is hard. More formally, if we assume 
that no algorithm is able to forge a signature in the game of definition 2 with 
a non-negligible probability and if a distinguisher D is able to distinguish valid 
signature from invalid ones for a messages and an identity of its choice with 
a non-negligible advantage e after having asked qE key extraction queries, then 
there exists a distinguisher B that has an advantage e' > for the DBDH 

problem within a running time bounded as in theorem 1. 

Proof, given in the full paper ([28]). □ 

It is possible to directly show that the scheme also satisfies the anonymity prop- 
erty in the random oracle model under the Decisional Bilinear Diffie-Hellman 
assumption. However, since anonymity and invisibility are essentially equiva- 
lent, the anonymity of our signature derives from its invisibility property. 
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5 Another Application 

Our construction provides an application of independent interest which is the 
possibility to design an identity based signature with a ’tighter’ security proof 
than all other existing provably secure identity based signatures ([9], [17], [23], [25]) 
for which the security proofs make use of the forking lemma ([34], [35]): indeed 
by concatenating a signature produced by the undeniable scheme with a non- 
designated verifier proof of its validity (using the non-designated verifier coun- 
terpart of the confirmation protocol), we obtain a universally verifiable identity 
based signature for which the security is the most tightly related to some hard 
computational problem. Recall that all existing identity based signatures have a 
security proof built on the forking lemma of Pointcheval and Stern that involves a 
degradation in security during the reduction as pointed out in [22] : if qn denotes 
the number of message hash queries and t the forger’s running time, then the up- 
per bound on the average running time to solve the problem is qnt (if we assume 
qn < 2®°, this makes a great degradation for the bound on the running time). 
This new ID-based signature may be viewed as an adaptation of the signature 
recently proposed by Goh and Jarecki ([22]). It is less efficient than all the other 
ones but is more tightly related to some computational assumption than those 
in ([9], [25]), which are only loosely related to the computational Difhe-Hellman 
problem, or the scheme in ([23]) that has a security loosely related to the RSA 
assumption. As for the proof of unforgeability of the undeniable signature under 
the Gap Bilinear Difhe-Hellman assumption, one can show that, if the forger’s 
advantage is e, then the average time to solve the BDH problem is smaller than 
2^^tje where t is the running time of the forger. The corresponding bound for 
the identity based signature described in [25] is roughly if 2^° identity 

hash queries and 2®° message hash queries ^ are allowed to the attacker. 



6 Conclusions 

We showed in this paper a first construction for a provably secure identity based 
undeniable signature and we extended the panel of primitives for identity based 
cryptography ([!]). A proof of existential unforgeability under the Bilinear Difhe- 
Hellman assumption is given in the full paper ([28]). We pointed out that a 
’tighter’ proof can be made under the Gap Bilinear Difhe-Hellman assumption. 
We also extended the notions of invisibility and anonymity of Galbraith and 
Mao ([20]) to the identity based setting and we proved the invisibility of our 
scheme in the random oracle model under the Decisional Bilinear Difhe-Hellman 
assumption. 

As a side effect, our construction allows the design of an identity based sig- 
nature scheme with a security more tightly related to the hardness of some hard 
problem than any other existing provably secure identity based signature. 

® Hashing onto a finite field may be viewed as an operation of unit cost while hashing 
onto an elliptic curve requires some extra computation as explained in [2]. 
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Compressing Rabin Signatures 
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Abstract. This note presents a method to compress a Rabin signature 
[Rab78] to about half of its length. 



Let n > 1 be a (square free) public key of the Rabin signature scheme and let 
(s,m) be a Rabin signature of a message m. I.e., 

= h{m) (mod n), 

where /i is a message formatting function. The goal of this note is to replace s by 
a positive integer v smaller than ^/n such that v, n and m are sufficient to recover 
the signature s, without knowledge of the secret key. The paper assumes that h 
is deterministic, i.e., the value h{m) can be computed without the knowledge of 
s. For example PKCS #1 v.1.5 [RSA93] signatures use deterministic formatting, 
but RSA-PSS [BR96] signatures do not. 

Previously, Coron and Naccache [CN03] and independently Bernstein [Ber] 
have shown that a Rabin signature can be reconstructed if e.g., more than half 
of the most significant bits of s are known. They use Coppersmith’s LLL-based 
root finding method [Cop96] . This method leads to a slow decompression when 
the fraction of known bits is close to 1/2. Bernstein notes however, that a fast 
decompression method can be found when at least 2/3 of the bits are given. 

Continued Fractions: Let a be a real positive number. Define oq = ce, qi = 
and define recursively a^+i = l/{ai\ for alH > 0 until {ai} = 0. Then the partial 
convergents Ui/vi of s can be computed by Uq = qg, Vq = l,Ui = qoqi^Vi = gi + 1 
and Ui +2 = qi+ 2 Ui+i + Ui, Vi +2 = qi+ 2 Vi+i +Vi. The theory of continued fractions 
asserts that the principal convergents Ui/vi are close rational approximations 
of a. In particular the following equation is satisfied (see e.g. [KnuSl, §4.5.3, 
Eq. (12)],[Lan95, Ch. 1, Theorem 5]). 



\vta - UjI < l/vi+i 



( 1 ) 



If a is rational then there exists an integer k with {ak} = 0 and Ukjvk = a. 

Compression: We compress a signature (s,m) as follows: If gcd(s,n) ^ 1 then 
output an error and stop. Let Uijvi, i = 1, . . . , A: be the principal convergents of 
the continued fraction expansion of s/n. Let £ be such that vi < y/n < vi+\. 
Then {vi,m) is the compressed Rabin signature. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 126-128, 2004. 
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Verification and Decompression: Let (v,m) be a compressed signature. If 
gcd(r;,n) ^ 1 then output an error and stop. Otherwise, compute 0 < t < n 
such that 

t = h{m)v^ (mod n). 

The compressed signature is valid if and only if t is a square in If it is valid 
then set w = y/i and s = w/v (mod n) and output (s, m). 

Analysis: Neither compression nor decompression need to use the secret key. 
The following theorem shows that any valid Rabin signature can be converted 
into a valid compressed signature and vice versa. Thus Rabin signatures and 
compressed signatures are equally difficult to forge. 

Theorem 1. Let n he a Rabin public key that is square free. 

(I) If (s,m) is a valid Rabin signature then the compression algorithm gen- 
erates a valid compressed signature for m or finds a nontrivial factor of n. 

(II) If (v,m) is a valid compressed signature then the decompression algo- 
rithm generates a valid Rabin signature for m. 

Proof. (I) By assumption (s,m) is a valid signature, if = h{m) (mod n) 
and gcd(s,n) = 1. The later condition implies Vk = n for the last principal 
convergent Uk/vk = s/n. Since the denominators v\, ... ,Vk are strictly increasing 
there exists i : V( < ^Jn < and therefore the compression is well defined. 
Let (vi,m) be the signature computed by the compression algorithm. Setting 
a = s/n in (1) implies 



\viS — Uiu\ < n/vi+i < 

Hence there exists r G ^ : |r| < such that r = vgs (mod n). By assumption 
n is not a square, hence 0 < < n and = {vifi'hfm) (mod n). Thus the 

value t computed in the decompression is indeed a square in Z. Hence {vi,m) 
is a valid compressed signature unless gcd{ve,n) > 1. 

(II) The compressed signature (v,m) is by assumption valid if 0 < t < n : 
t = v^h(m) (mod n) is a square in Z. Then w = '/i is an integer and thus 
= ufi jv'^ = h{m) (mod n). Hence (s,m) is a valid Rabin signature. □ 

Time Complexity: Compression requires a continued fraction expansion and 
takes time 0(log(n)^). Decompression requires two multiplications and an in- 
verse over 7Z /nTZ, and a square root in 7Z, and hence also takes time 0(log(n)^). 
Note, that these bounds are obtained by using school book methods. Asymp- 
totically faster algorithms (e.g. FFT based gcd) are not optimal for typical key 
sizes. 

A Variant: An alternative compressed signature is (|r|,m), where r £ Z is 
such that |r| < n and r = vis (mod n). In the proof of Theorem I it is shown 
that such an r exists when vi < i/n < vi+\. A compressed signature is valid 
if h{m) /r^ mod n is a square in Z . Decompression is done using the equality 
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(vi)^ = h{m)/r‘^ (mod n). This variant is more expensive, because the verifier 
has to compute an additional modular inverse. But the variant has the advan- 
tage that the verification accepts both compressed and uncompressed signatures 
without modification. 

RSA: The method can be extended to RSA signatures with small public expo- 
nent (i.e. e = 3), but the benefits are smaller. For e = 3 the signature can be 
compressed to 2/3 of its size as follows. 

Assume that 

= h{m) (mod n), 

is an RSA signature, where h is again a deterministic formatting function. To 
compress a signature one computes the continued fraction expansion of s/n 
and selects the principal convergent ui/vg satisfying vi < < u^+i. The 

compressed RSA signature is {vi,m). 

Equation (1) implies 

\v£S — uen\ < 

and thus there exists r G Z with |r| < and = h{m){v()^ (mod n). 
Given h{m) and vi this value r can be found by checking whether either of 
h{m){viY mod n or n — h{m){vi)^ mod n is a cube in !S. Finally, one can re- 
construct the signature noting by setting s = rjvt (mod n). 



Acknowledgment. I’m thanking the program committee for their suggestions. 

References 

[Ber] D. J. Bernstein. Squeezing Rabin signatures, unpublished. 

[BR96] Mihir Bellare and Phillip Rogaway. The exact security of digital signatures - 
how to sign with RSA and Rabin. In Ueli Maurer, editor. Advances in Cryp- 
tology - EUROCRYPT ’96, volume 1070, pages 399-416. Springer Verlag, 
1996. 

[CN03] Jean Sebastien Coron and David Nacacche. Precede de reduction de la taille 
d’une signature RSA on Rabin. Patent FR28293333, March 2003. 

[Cop96] Don Coppersmith. Finding a small root of a univariate modular equation. In 
Advances in Cryptology - EUROCRYPT ’96, volume 1070 of Lecture Notes 
in Computer Science, pages 155-165, Berlin, 1996. Springer Verlag. 

[Knu81] Donald E. Knuth. The art of computer programming, Seminumerical Algo- 
rithms, volume 2. Addison Wesley, 2nd edition, 1981. 

[Lan95] Serge Lang. Introduction to Diophantine Approximations. Springer Verlag, 
new expanded edition, 1995. 

[Rab78] Michael O. Rabin. Digitalized signatures. Foundation of Secure Computa- 
tion, pages 155-169, 1978. 

[RSA93] RSA Data Security, Inc. PKCS #7; RSA Encryption Standard. Redwood 
City, CA, November 1993. Version 1.5. 




A Key Recovery System as Secure as Factoring 



Adam Young^ and Moti Yung^ 



^ Cigital Labs 
ayoungOcigital . com 

^ Dept, of Computer Science, Columbia University 
motives . columbia.edu 



Abstract. There has been a lot of recent work in the area of proving 
in zero-knowledge that an RSA modulus N is in the correct form. For 
example, protocols have been given that prove that N is the product of: 
two safe primes, two primes nearly equal in size, etc. Such proof systems 
are rather remarkable in what they achieve, but may be regarded as 
being heavyweight protocols due to the computational and messaging 
overhead they impose. In this paper an efficient zero-knowfedge protocof 
is given that simuftaneously proves that A is a Bfum integer and that 
its factorization is recoverable. The proof system requires that the RSA 
primes p and q be such that p = q = 3 mod 4 and another sematically 
secure encryption. The solution is therefore amenable for use with 
systems based on PKCS #1. A proof is given that shows that our 
algorithm is secure under the integer factorization problem (and can be 
turned into a non-interactive roof in the random oracle model). 

Keywords: RSA, Rabin, Blum integer, quadratic residue, pseudosquare, 
zero-knowledge, public key cryptography, PKCS #1, semantic Security, 
chosen ciphertext security, standard compatibility, key recovery. 



1 Introduction 

The RSA [27] algorithm has gained widespread use in the security industry. 
However, many existing systems trust user’s to generate RSA key pairs honestly 
without any form of compliance checking. Without knowing a given user’s private 
key, the most simple compliance tests include verifying that: N is not divisible by 
small primes, that N is not prime, and that N is not a perfect power. However, 
using zero-knowledge techniques, much more can be proven about the form of 
N. 

A zero-knowledge (ZK) protocol has been given for showing that N is the 
product of two safe primes [6]. Also, a zero-knowledge protocol has been given 
that proves that N is square-free (see Section 2 for definition, with a soundness 
error of 1/2^ where K is the number of rounds in the proof [1]. A protocol that 
does not leak too many bits has been given that proves that N is the product 
of two nearly equal primes, assuming that N has already been proven to be the 
product of two distinct primes [17]. Also, a proof system has been given for prov- 
ing that A is a Blum integer [16]. Finally, a zero-knowledge proof of membership 
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in the set of pseudosquares mod N has been presented [10]. A PVSS for secret 
sharing of factoring based private keys was presented by Fujisaki and Okamoto 
[8]. This PVSS relies on a non-standard strong RSA assumption. Related work 
that encompasses verifiable encryptions is by Camenisch and Damgaard [5]. 

When key recovery is needed, a higher level of compliance verification is ver- 
ifying that a given user’s private key is recoverable by a designated recovery 
authority. Ideally such systems will insure that only the user knows his or her 
own private key until such time as it is needed by the key recovery authority 
or authorities. A key recovery system for RSA was presented by Poupard and 
Stern that has the advantage that the ciphertext of the escrowed private key 
is very small [24], in fact, considerably smaller than the non-interactive zero- 
knowledge proof transcript that is proposed here. The system is based on the 
Paillier cryptosystem that is semantically secure under the composite residuos- 
ity class assumption [23]. It is argued that the system can also be based on the 
Naccache-Stern [20] and Okamoto-Uchiyama [21] cryptosystems. The Naccache- 
Stern cryptosystem is semantically secure under the Prime Residuosity Assump- 
tion. The Okamoto-Uchiyama cryptosystem uses public keys of the form n = p^q 
and is semantically secure under the p-subgroup assumption. We remark that 
Paillier’s cryptosystem^ uses public keys that are not compatible with PKCS 
^1 as currently defined. In this paper a factoring based key recovery system is 
presented and it is proven secure assuming that there exists a public key cryp- 
tosystem that is semantically secure against plaintext attacks for single messages 
in the uniform model. Since there exist such cryptosystems under the factoring 
assumption (e.g.. Optimal Asymmetric Encryption [3,28] based on Rabin [25]), 
we prove that our solution is secure if and only if factoring a Blum integer is 
hard. We remark that OAEP is semantically secure against adaptive chosen ci- 
phertext attacks in the random oracle model under the partial one-wayness of 
the underlying permutation [9]. 

Though the space reduction is remarkable and the combination of methods 
ingenious, a significant drawback to the Poupard-Stern approach is that it relies 
on cryptosystems that are not secure against chosen ciphertext attacks^. The 
reason that the Paillier cryptosystem was used as the primary example is that 
it is not clear how a chosen ciphertext attack can be used effectively against it 
in this scenario. So, to be on the safe side their solution imposes the restriction 
that the recovery authority decrypts messages upon request and does not reveal 
the decryptions themselves. This is not the case in the solution we propose here. 
The escrow authority can use any public key encryption algorithm including one 
that is secure against adaptive chosen ciphertext attacks. This implies that the 
range of application of our solution is larger (this scenario was advocated by 
Camenisch and Shoup [7]). 

The efficiency of the Poupard-Stern solution is as follows [24] . The probability 
of a cheating strategy to succeed during a proof of fairness is smaller than IjB^, 

^ Paillier’s cryptosystem is remarkable since it constitutes an additive homomorphism 
under multiplication modulo p^q^. 

^ e.g., the Paillier scheme is malleable. 
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SO (.\B\ must be large enough, e.g., (.\B\ = 80 in order to guarantee a high-level 
of security. Also, the workload for a third party to defeat the security is 0{'/B) 
in the worst case so B cannot be too large. They recommend setting £ = 2 and 
B = 2'^°. The value of £ is the number of iterations in the proof system. The 
system therefore has a recovery time vs. space trade-off. By increasing £, the 
size of the non-interactive transcript goes up. By increasing B the size of the 
transcript goes down, but recovery takes longer. By taking B = 2, the transcript 
winds up being the size of a typical non-interactive ZK transcript. The solution 
therefore has obvious advantages in practice in terms of space efficiency, but 
does not constitute an asymptotic improvement over existing methods for the 
exponentially small advantage cases. 

The solution we propose makes no assumptions regarding the form of the es- 
crow authority public key. However, having the escrow authority use a composite 
public key N\ is a good choice. It is a slightly optimized version of an ealier ver- 
sion of the algorithm [29] . Our solution requires that the modulus whose private 
key is being escrowed be a Blum integer that is not a perfect power, but other- 
wise it is an RSA number. This implies that the solution is highly compatible 
with PKCS #1. Consider a PKCS #1 key generator that outputs primes p and 
q, among other values. If p ends in the two bits 01 or if q ends in the two bits 01 
then the key pair can be rejected and a new one generated. Only when p ends 
in 11 and q ends in 11 is the key pair accepted. This way, acceptance/rejection 
is used to guarantee that p = q = 3 mod 4. Note that multiprecision library 
calls are not needed to verify this. For example, assuming that the prime p can 
be obtained as a binary number in the form of a byte array, the least significant 
byte u ends in the two bits 11 if and only if the bitwise logical AND of u and 
0x03 equals 0x03 in hexadecimal. Compatibility with PKCS #1 is important 
since RSA is so widely used. 

It is shown that in the random oracle model, 2m iterations of the non- 
interactive version of the protocol are needed to achieve a computational zero- 
knowledge error of at most 1/2™ (note that since PKCS compatibility is a 
major goal and since PKCS #1 uses OAEP which is random oracle based, as- 
suming the use of a random oracle for our non-interactive version makes sense). 
For concreteness, suppose that N is 768 bits and Ni is 1024 bits. Also, sup- 
pose that m = 40. As shown in Appendix B, the non-interactive zero-knowledge 
transcript requires about 28 kilobytes in this case. 

So, with respect to previous work on this problem we argue that our solution 
has the following novel features: 

1. High PKCS #1 Compatibility: The users modulus N needs to be a 
Blum integer a check that is easy to implement inside the key generation as 
argued above. 

2. Weaker Theoretical Security Assumption: Unlike previous proposals 
which utilize new assumptions such as the composite residuosity class as- 
sumption, we only assume that factoring is hard. 

3. Protection Against Adaptive Attacks: The escrow authority can use a 
key and corresponding cryptosystem that is secure against adaptive chosen 
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ciphertext attacks. This allows the system to be used in applications where 
the system is exposed to such attacks. 



2 Notation and Definitions 

Let J{x/N) denote the Jacobi symbol of x with respect to N. Let denote 
the set of integers contained in {0, 1, 2, ..., — 1} that are relatively prime to 

N. Also, let denote those elements in with Jacobi symbol 1 with 

respect to N. Recall that a is a pseudosquare mod N provided that a is not 
a quadratic residue mod N and that a G Two roots do and di of 

a quadratic residue a mod N are called ambivalent roots of a \i do ^ c?i and 
do yf — c?i mod N . A number N is square-free if does not divide N for any 
m > 1. If the prime powers appearing in the factorization of N all have odd 
exponents then we say that N is free of squares (not to be confused with being 
square-free) . 

The proof system of van de Graaf and Peralta utilizes the following definitions 
for Blum integers and Class II integers. A Blum integer is defined as the product 
of two prime powers p’’ and g® such that p, g = 3 mod 4 with r and s odd. 
The composite N = p^’’g® is a class II integer if p, g are prime, p = 3 mod 4, 
g = I mod 4, and s is odd. Their proof system is based on the following key 
observation: If TV is a Blum integer then you are guaranteed that x or —x is a 
quadratic residue mod N, whereas if N is not a Blum integer or of class II, then 
the probability that one or both of x and —x are quadratic residues mod N is 
less than or equal to 1/2. This observation allows the prover to convince the 
verifier that IV is a Blum or a class II integer. A method based on computing 
Jacobians is used to convince the verifier that N is not a class II integer. 

In the sequel, we will let N = VW where W is the part of N that is free of 
squares and P is a square. Let w denote the number of distinct prime factors 
of W and let v denote the number of distinct prime factors of V. From van de 
Graaf and Peralta [16] (page 130) we have the following fact. 

Fact 1: If a then a € QRn with probability (1/2)’'+““^. 

The well-known definitions of ensembles, polynomial indistinguishability, and 
multiple-message indistinguishability of encryptions is given in Appendix A. In 
this paper we will only consider adversaries contained in BPP. 

Let denote a randomized single-message semantically secure encryp- 

tion function in the uniform model that uses public key e (in fact, Eg can be 
OAEP which is secure against adaptive chosen ciphertext attacks [9]; it can be 
based on RSA). Let Dg{-) denote the corresponding decryption function. Let M 
be the message space and let S'! denote the set from which the random string is 
drawn to form the probabilistic encryption. Let C denote the ciphertext space. 
To encrypt m € M to get the ciphertext c € C, we choose r Si and compute 
c = Ee{m, r). Thus, m = Dg{Eg{m, r)). 
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3 Background: ZK Proof for N = Is a Blum Integer 

All of the theorems and algorithms in this section were taken directly from van 
de Graaf and Peralta [16]; a reader familiar with the system can skip it. N is 
a Blum integer if N = p'~q^, p and q are distinct primes, r and s are odd, and 
p = q = 3 mod 4. The following theorems provide the basis for proving the form 
of fV. 

Theorem 1. Suppose N is a Blum integer or of class II. Let a be a random 
number in Then either a or —a is a quadratic residue mod N. 



Theorem 2. Suppose N = 1 mod 4 and has more than two distinct prime 
factors. If a is a random element in ^^(+1) then the probability that at least 
one of a or —a is a quadratic residue modulo N is less than or equal to 1/2. 

The following protocol proves that is a Blum integer. 

AtomicProtocol F or N: 

1. P and V choose a together randomly 

2. P and V choose p Gr {1,-1} together randomly 

3. P sends V a square root /3 of a or —a with Jacobi symbol equal to p 

4. V accepts iff all of the following hold: 

5. N is not a square and N is not a prime power 

6. iV = 1 mod 4 

7. = a mod N or = —a mod N 

8. J{f3/N) = p 

The value N = p^^q^ where p and q are prime could form a valid Blum integer. 
Hence, in this example N = (p^q)^ is a perfect cube. It is straightforward to add 
an additional check to verify that N is not a perfect power (as explained below). 

Note that one need only show zero-knowledge with respect to an honest veri- 
fier since the verifier does not send any challenges to the prover (rather a random 
challenge is jointly chosen; typically using a simulatable coin flipping protocol). 
Honest verifier zero-knowledge can be shown using a standard simulation argu- 
ment. 



4 Interactive Atomic Proof of Recoverability 

Recently it has been shown that key recovery systems for public key infrastruc- 
tures can be implemented as efficiently in terms of protocol overhead as PKI 
systems without a key recovery mechanism [30,31,32]. This is accomplished by 
having the user who generates his or her own key pair provide the CA with a 
public key and certificate of recoverability prior to having the user’s public key 
certified. A certificate of recoverability certifies that the private key is recover- 
able by a designated key recovery authority given the public key and certificate 
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of recoverability. The Certificate Authority certifies the public key if and only if 
the certificate of recoverability is valid. This is the approach that is taken here. 

Bellare and Rogaway formalized the Fiat-Shamir heuristic into what is known 
as the random oracle model [2]. In section 5.2 of that paper, they give a generic 
construction that shows how to turn any three round computational ZK atomic 
proof system achieving error probability 1/2 into a non-interactive proof system 
using a random oracle. They prove that the resulting proof system is computa- 
tional zero-knowledge in the random oracle model, and assume that the first and 
last messages are from the prover to the verifier, and that the middle message is 
from the verifier to the prover. More specifically they prove that to achieve an 
error of in the non-interactive proof, 2k{n) iterations are sufficient. Here 

n is a security parameter and k(n) = uj{log n) is given. 

There are two important things to note about section 5.2. First, it is assumed 
in the simplifying assumptions section that the atomic protocol is only honest 
verifier zero-knowledge. This is evidenced by the fact that the ensemble for the 
simulator S' is constructed with a bit b chosen uniformly at random (not by a 
subroutine V* representing a potentially cheating verifier). Thus, the proof of 
zero knowledge of the resulting non-interactive proof in section 5.2 simulates the 
view of an honest verifier since that is all that is necessary in the non- interactive 
random oracle case. Hence, we can assume that V can be trusted to generate 
truly random values (e.g., values uniformly at random from in the 

interactive atomic protocol. 

Second, upon close inspection of their simplifying assumptions it is clear 
that they have in fact shown this not only for perfect ZK proof systems, but 
for computational ZK proof systems as well. This is because they assume that 
the ensemble corresponding to transcripts produced between the prover and the 
verifier and the ensemble corresponding to the simulated transcripts are only 
computationally indistinguishable. This is important since the use of the seman- 
tically secure encryption function only guarantees computational indistin- 
guishability of the two transcript ensembles. 

4.1 The Atomic Protocol 

The intuition behind the recovery aspect of the protocol is as follows. The prover 
is given the public key of the escrow authority. The prover has a Blum integer N 
and wishes to prove to a verifier that a factor of N is recoverable by the escrow 
authority. To do so, the prover generates two ambivalent roots of a quadratic 
residue modulo N. One root has Jacobi 1 and the other has Jacobi —1. These 
roots are encrypted under the public key of the escrow authority. The two ci- 
phertexts along with the quadratic residue are given to the verifier. The verifier 
chooses a sign 1 or —1 at random and forces the prover to open a ciphertext 
containing a root that has a Jacobi that matches the randomly chosen sign. 

Let iVi be the public key of the escrow authority. Recall that this protocol 
requires that A be a Blum integer that is not a perfect power. The atomic 
protocol will now be described. 
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SQRT AtomicProtocol : 

1. P chooses V Gr QRn 

2. P computes dg to be a root of u with Jacobi 1 

3. P computes d\ to be a root of u with Jacobi -1 

4. P chooses ro,ri Gr 

5. P computes cj = {dj,rj) for j = 0, 1 

6. P sends V the tuple (N, r, cq, ci) 

7. V sends (6, /x, a) to P where b Gr {0, 1}, fi Gr {I, —1}, a Gr ^^(+1) 

8. P computes a square root /J of a or —a with Jacobi symbol equal to fi 

9. P sends V the tuple (D,R,f3) = (db,rb,f3) 

10. V accepts iff all of the following hold: 

11. iV is not a prime and N is not a perfect power 

12. iV = 1 mod 4 

13. (3'^ = a mod N or f}"^ = —a mod N 

14. J{f3/N) = ^ 

15. DgZZ% 

16. = V mod N 

17. J{D/N) = 

18. R G Si and Cb = {D, R) 



It is well known that using a binary search, N can be tested for being a 
perfect power in time 0{{log^ N) log log log N) [19]. Improvements to perfect 
power testing have been given [4] where the worst case running time is shown 
to be 0{log^ N) using a modification of Newton’s method. 

Recall that a non-trivial factor of iV is a factor that is less than N and greater 
than 1. The escrow authority attempts to recover a non-trivial factor ip ol N 
by computing ipi = gcd{do + di, N) and 'p2 = gcd{do — di,N). The reason this 
works is the following. Let w, —uj be the two non-trivial roots of unity. It follows 
that di = ±uido ^ d^ — (iwdo)^ = 0 mod N — dg = kN for some integer 
k. Hence, | (dg — di)(dg + di). But, N does not divide (dg — di) and does 
not divide (dg J- di). Thus, ^|Jl or ip2 must be a non-trivial factor of N. It is 
not hard to show that when N is a Blum integer with no integer roots (not a 
perfect power), a non-trivial factor of N can be used to efficiently recover the 
full factorization of N. 

The following is the transcript of the protocol. 

tpy = {N,J^,co,Ci,b,g,a,D,R,f3) 



Theorem 3. If N = pq where p and q are distinct primes and —1 is a pseu- 
dosquare mod N then P can always compute a square root (3 in step 8. 

Proof. This follows immediately from Theorem 1. o 



From Theorem 3 it is not hard to see that the atomic protocol is complete. 
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4.2 Security of the Atomic Protocol 

The intuition behind the proof of zero knowledge is as follows. First, a prob- 
abilistic poly-time simulator S is given. It is then argued that the transcripts 
it produces are polynomially indistinguishable from true transcripts. This is 
proven by assuming otherwise, thereby implying the existence of a transcript 
distinguisher D. A probabilistic poly-time algorithm D' is then described that 
uses I? as an oracle to break the single-message indistinguishability of E in the 
uniform model, thereby reaching a contradiction. The key aspect behind the con- 
struction of D' is as follows. D' takes as input (A„, Y„,p, q, e, Ce) where A„ is a 
random variable corresponding to the unopened plaintext in a true transcript. 
Yn is a random variable corresponiding to the unopened plaintext in a simulated 
transcript. Y„ = with probability unity. The primes p and q constitute a 
priori information concerning these plaintexts. The value e is the public encryp- 
tion function and Ce is the ciphertext for which we are trying to distinguish as 
encrypting A„ or Y„. The algorithm D' constructs exactly one transcript and 
has no idea whether it corresponds to a simulated or true transcript. It does this 
by using (p, q) to compute a root of mod pq which is ambivalent with respect 
to Xn- This root is then encrypted under e to obtain a second ciphertext. If Ce 
encrypts A„ then the resulting transcript is a true transcript. If Cg encrypts 1 
then the resulting transcript is a simulated one. D' gives the resulting transcript 
to D to let D decide. The output of D is the final output. 

Theorem 4. (computational ZK - uniform model) If E is single-message se- 
mantically secure then SQRT AtomicProtocol is honest verifier computational 
zero-knowledge. 

Proof. (Sketch). Assume that E is single-message semantically secure in the 
uniform model. Thus, by Theorem 5.2.15 of Goldreich it follows that E is single- 
message secure in the sense of ciphertext indistinguishability in the uniform 
model, since single-messages is a special case of Theorem 5.2.15. Let V be the 
honest verifier. Consider the following simulator. 

S{N): 

1. choose b Gr {0,1}, t Gr {1,-1}, and ro,ri Gr Si 

2. if 5 = 0 then set u = 1 else set u = —1 

3. choose D Gr s.t. J{D/N) = u 

4. compute n = mod N 

5. choose (3 Gr 

6. compute p = J{(3/N) and a = t/3^ mod N 

7. compute Cq = i?Arj((ll^l)^Z?^“*',ro) and Ci = ri) 

8. output ts = (A, ly, Co, ci, 6, p, a, D, rj,, j3) and halt 

Let Upy denote the probability ensemble over the transcripts tpy. Let Us 
denote the probability ensemble over the transcripts ts. These two ensembles 
are polynomially indistinguishable. To show this, assume for the sake of contra- 
diction that they are not. Then it follows from Definition 2 that there exists a 
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distinguisher D G BPP, a polynomial p, and a sufficiently large n such that 
D distinguishes these two ensembles correctly with advantage at least l/p{n). 
Hence, 



Pr[D{Upy, 1”) = 1] - Pr[D{Us, 1”) = 1] > l/p(n) 

It will be shown how to use D as an oracle in an algorithm D' that distin- 
guishes single- message encryptions. Let Ce denote an encryption of Xn or F„. 

I?'(Z„,e,Ce): 

1. ii Zn is not a 4-tuple then output a random bit and halt 

2. set {Xm YniP, q) = Zn and compute N = pq 

3. compute p = X^ mod N 

3. compute D randomly s.t. = v mod N and s.t. J{D/N) = —J{Xn/N) 

4. if J{D/N) = 1 then set & = 0 else set 6=1 

5. set ci_h = Ce 

6. choose n &R Si, /3 Gr and t Gr {1,-1} 

7. compute p = J{f3/N) and a = t/3^ mod N 

8. compute Cf, = E]ii^{D,rb) 

9. set t = (iV, V, Co, Cl, 6, p, a, D, rt,, /?) 

10. output D{t, 1”) and halt 

Let Xn denote the probability distribution over the plaintext messages in Ce 
corresponding to the prover P interacting with V. Let Yn denote the probability 
distribution over the plaintext messages in the unopened value Ce corresponding 
to the simulator S. In this case Pr[Yn = = 1. Define the random variable 

Wn such that Pr[Wn = (p,q)] = 1 where N = pq. Finally, take Z„ = X„F„bF„. 
By taking Gi(l”) = Ni it is not hard to see that, 

Pr[D'{Zn,Ni,EN,(Xn)) = 1] = Pr[D{Up,v, 1”) = 1] 

and 

Pr[D'(Zn,Ni,ENAYn)) = 1] = Pr[D{Us,l^) = 1] 

Hence, D' distinguishes the encryptions with non-negligible advantage. But 
this implies that E is not secure in the sense of indistinguishability for single- 
messages in the uniform setting. Hence, a contradiction has been reached. So, 
the assumption the two ensembles are distinguishable is wrong, o 

Define L to be the set of Blum integers that are not perfect powers. 

Theorem 5. If N ^ L or P does not provide a correct pair of roots (to the 
escrow authority), then P passes with probability at most 1/2. 

Proof. Assume that N ^ L or that P does not provide a correct pair of roots. 
From van de Graaf and Peralta’s proof of soundness [16] it follows that if N is 
not a Blum integer then P passes with probability at most 1/2. If is a perfect 
power then P passes with probability 0. Hence, \i N ^ L then P passes with 
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probability at most 1/2. It remains to consider the case that N G L but that 
P does not provide a correct pair of roots. Since N = G L it follows that 
J(— 1/p'’) = J(— l/g®) = —1 due to the fact that r and s are odd. 

Consider the case that neither cq nor ci encrypts a root of v. In this case P 
fails the proof with probability unity due to step 16. Now consider the case that 
only one root is encrypted in the pair of encryptions (cq, ci). Let po denote the 
probability that cq is the proper encryption of a root. Note that po may depend 
on the transcript generated thus far. Since the verifier is honest, V asks for cq 
to be opened with probability 1/2. 'With probability 1 — po, ci is the proper 
encryption of a root. It follows that the probability that P fools V in step 16 is 
at most po(l/2) + (1 - Po)(l/2) = 1/2. 

Finally, consider the case that cq and ci encrypt square roots. Since it is 
assumed that P is cheating, the plaintexts in cq and Ci cannot have differing 
Jacobi symbols. To see this note that if this were the case the roots would form 
an ambivalent pair of roots. Since the verifier chooses the challenge honestly it 
follows that the probability that P fools V in step 17 is at most 1/2. 

It has been shown that in all cases P passes with probability at most 1/2. o 

Observe that the atomic protocol is a standard three round protocol in which 
a message is sent from P to V, then from V to P, and finally from P to V. It was 
shown to be honest verifier computational ZK, and was shown to be sound with 
error 1/2. It follows that the transformation into NIZK of Bellare and Rogaway 
in Section 5.2 directly applies. 

5 Conclusion 

A key recovery system for factoring based private keys was presented. It was 
proven secure in the random oracle model based on the integer factorization 
problem. It was shown to be highly compatible with existing PKCS imple- 
mentations. The scheme is meant to be an alternative to existing schemes since 
it is based directly on thedifficulty of factoring. 
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A Appendix: Cryptographic Definitions 

Recall from probability theory that a random variable is formally defined as a 
real- valued function that is defined over a sample space [26] . 

Definition 1. (ensembles): Let I be a countable index set. An ensemble indexed 
by I is a sequence of random variables indexed by I. Namely, X = where 

the Xi ’s are random variables, is an ensemble indexed by I. 

If X is a random variable and / is a function then f{X) is the random variable 
defined by evaluating / on an input chosen according to X [18]. Thus, f{X,X) 
would normally represent a random variable defined by evaluating / on a pair 
where each value in the pair is chosen according to X. However, in this work 
we adopt the following convention from Section 1.2.1 of [14]. All occurrences of 
the same symbol in a probabilistic statement refer to the same (unique) random 
variable. Thus, f{X,X) refers to the random variable found by evaluating / on 
an input pair (x,x) where x is chosen according to X. 

The following is Definition 3.2.2 of O. Goldreich. 

Definition 2. (polynomial-time indistinguishability): 

def def 

Two ensembles X = {Ar„}„g]N and Y = {Y„}„giN are indistinguishable in poly- 
time if for every probabilistic poly-time algorithm D, every polynomial p{-) , and 
all sufficiently large n, 

Pr[D{X„, 1") = 1] - Pr[D(y„, 1") = 1] < l/p(n). 

Goldreich makes the distinction between uniform adversaries (those con- 
tained in BPP) and non-uniform adversaries (those contained in P/poly). For 
an explanation of these types of adversaries, see [12,13]. An ensemble X 

„g]N is said to be poly-time constructible if there exists a probabilistic 
poly-time algorithm S so that for every n, the random variables S'(l") and 
are identically distributed. 

Let G(l”) be a key pair generator. Define G(l”) = (Gi(l”), G 2 (l")) where 
Gi(l”) is the public key and G 2 (l") is the private key. We may assume that 
|Gi(l")| and |G2(1")| are polynomially related to n. The following definition is 
from [15]. 




A Key Recovery System as Secure as Factoring 141 



Definition 3. (indistinguishability of encryptions - uniform-complexity): 

An encryption scheme, (G,E,D), has uniformly indistinguishable encryptions 
in the public-key model if for every polynomial t, every probabilistic polynomial 
time algorithm D' , every polynomial-time constructible ensemble T {T„ = 
XnYnZn}nSTN, With X ^ = {Xk^\ , F„ = }, and 



\ = poly{n), 



Pr[D'(Z„,Gi(l"),£;G,(^)(X„))_= 1]- 

Fr[D'(Z„,Gi(l”),i?GRi'.)(i"n)) = 1 ] < l/p{n) 

for every positive polynomial p and all sufficiently large n ’s. 

We stress that is a sequence of random variables, which may depend on 
one another. The random variable Z„ captures a-priori information about the 
plaintexts for which encryptions should be distinguished. 

B Appendix: Storage Requirements 

Let H he & random function such that, 

H : {0, 1}* ^ {0, X {1, -1}2™ X 

The amount of information sent from P to F can be reduced due to redun- 
dancy. This is illustrated in the NIZK version of the protocol given below. 

SQRTNIZKProof(N ) : 

1. for f = Ito 2m do: 

2. P chooses Vi &r QRn 

3. P computes di^ to be a root of Vi with Jacobi 1 

4. P computes di^i to be a root of i>i with Jacobi -1 

5. P chooses ri^o,n,i Gr Si 

6. P computes Cij = (di,j,rij) for j = 0, 1 

7. P computes (61,62, ..., 62™, A*i, 1^2, ■■■,a 2 m) = 

H{N, C14), ..., {R2m, C2m,0j C2m,l)) 

8. for f = 1 to 2m do: 

9. P computes a square root fi of oi or —Oi with 

Jacobi symbol equal to yLi 

10. P sends to V the tuple {N, , /Ji), ..., 

(C2m,l — 62m > ^2m,b2rn ’ ^2rra,&2m J /^2 m ))) 

11. for i = 1 to 2m do: 

12. V computes t'j = mod N 

13. V computes = Ejv,^(dtx.,ri^bi) 

14. V computes (61, 62, ..., 62m, Mi. M2, M2m, oi, 02m) = 

H{N, (l^i, Ci^O, Cl,l), ..., {l^2m, C2m,0, C2m,l)) 

15. F accepts iff all of the following hold: 
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16. is not a prime and N is not a perfect power 

17. = 1 mod 4 

18. Pf = Gi mod N or Pf = — mod N ior i = 1, 2, ..., 2m 

19. J{Pi/N) = iii ior i = 1,2, ...,2m 

20. di^bi G for i = 1, 2, ..., 2m 

21. j{d^^bjN) = -1^* for i = 1,2,. ..,2m 

22. Ti^bi G S'! for 1=1,2, ..., 2m 



For concreteness suppose that Ni is a 1024 bit RSA key, Em.^ is OAEP, and 
A^ is a 768 bit public key. Also, suppose that the random string used in OAEP 
is about 256 bits in length. It follows that 1024 + 768 + 256 + 768 = 2816 bits 
of information is transmitted from P to P due to iteration i. This corresponds 
to {ci^i-bi,di^bi,f’i,bij Pi)- Recall that there are 2m iterations. Taking m = 40 the 
2m iterations occupy 28, 160 bytes. 
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Abstract. One of the main objectives of server-assisted computation is 
to reduce the cost of generating public key signatures for ordinary users 
with their constrained devices. On the other hand, based on nothing 
more than a one-way function, one-time signatures provide an attractive 
alternative to public key signatures. This paper revisits server assisted 
computation for digital signatures to show server assisted one-time 
signature (SAOTS) that combines the benefits of these two efficiency 
solutions. The proposed protocol turns out to be a more computational 
and round-efficient protocol than previous verifiable-server approaches. 
In addition, SAOTS offers other advantages like verification trans- 
parency, getting rid of public key operations for the ordinary user and 
proving the server’s cheating without storing the signatures. 

Keywords: server-assisted signature, one-time signature, digital signa- 
ture, pervasive computing. 



1 Introduction 

Broadly speaking, the most important design criteria for any kind of security 
protocol should of course be the ’’security”. At this point, note that absolute 
security may in practice be impossible to reach, thus the security quality could 
be relative. In a security protocol, the second criteria that has to be considered 
is ’’efficiency”, skillfulness in avoiding wasted time, effort and other resources. 

In the early days of computers where saving a few clock cycles has a meaning 
and available cryptographic tools are unoptimized and very slow, being inefficient 
might mean being unusable. Today, we see that in spite of high-performance 
computing and really fast cryptographic algorithms, efficiency still remains a 
remarkable concern in designing a security protocol. This is because with more 
efficient security protocols three important targets can be attained: 

— Cost cuts are possible e.g., by buying a less expensive and less powerful 
server machine. 

— “Pervasive computing” vision can be realized securely where computer ap- 
plications are hosted on a wide range of platforms, including many that are 
small, mobile and regarded today as devices having only limited computa- 
tional capabilities. 

— Security level can be increased e.g., by using a longer key length. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 143-156, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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Digital signatures, quickly find applications in many spheres of computing, 
are among the most fundamental and valuable building blocks of security pro- 
tocols. Basically, a digital signature provides three essential security services all 
in once: 

Authentication: assurance of the identity of the signer. 

Integrity: assurance that the message is not altered after it is signed. 

Nonrepudiation: blocking a sender’s false denial that he or she signed a 
particular message, thus enabling the receiver to easily prove that the sender 
actually did sign the message. 

While there are other means like message authentication codes (MACs) to 
ensure data integrity and authentication, digital signatures are better in one im- 
portant respect. They can be used to solve the nonrepudiation problem. More- 
over, the MAC approach is inadequate in a multicast setting because it is based 
on a shared secret among participants. 

Most current techniques for generating digital signatures are based on public 
key cryptography (based on complex mathematical problems such as factoring 
or discrete logarithms e.g., RSA [1] or DSS [2]), but it is well known that public 
key cryptography has efficiency problems. Moreover, some mobile devices may 
have 8-bit microcontrollers running at very low CPU speeds, so public key cryp- 
tography at any kind may not even be an option for them. 

One-time signatures (OTS), on the other hand, provide an attractive alter- 
native to public key based signatures. Unlike signatures based on public key 
cryptography, OTS is based on nothing more than a one-way function (OWF). 
Examples of conjectured OWFs are SHS [3] and MD5 [4]. OTSs are computa- 
tionally more efficient since no complex arithmetic is involved. 

We observe that; at one hand, despite the performance advantages provided 
by OTSs, they have gained attention in security world only in very recent years. 
On the other, server assisted computation was well-studied in the cryptography 
literature and more or less seems to reach its maturity. Hence the question 
to think about is: ’’Can we combine the efficiency of one-time signatures with 
server-assisted computation?” 

This paper tries to answer this question and revisits server assisted computa- 
tion for digital signatures to show server assisted one-time signature (SAOTS), 
that turns out to be more computational efficient and offers other advantages 
over previous alternatives. 

The rest of this paper is organized as follows. In the next section some back- 
ground material is provided. Previous studies on server-assisted digital signature 
protocols are summarized in section 3. The proposed SAOTS protocol is pre- 
sented in section 4. Section 5 and 6 are for the security analysis and performance 
evaluation of SAOTS protocol, respectively. Finally, section 7 concludes the pa- 
per. 
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2 Background 

2.1 One-Way Functions 

“One-way functions” (OWFs) are functions that are relatively easy to compute 
but significantly harder to reverse. That is, given x it is easy to compute f(x), 
but given f(x), it is hard to compute x. OWFs are public functions; no secret 
keys are involved. The security is in their one-wayness. Having been used for 
computer science for a long time, “hash functions” take a variable-length input 
and convert it to a fixed-length generally smaller output. And finally, ’’one-way 
hash functions” are hash functions that work in one direction or in another view, 
they are like digital fingerprints: small pieces of data that can serve to identify 
much larger digital objects (e.g., SHS [3] and MD5 [4].) 

SHS and MD5 were originally designed as one-way hash functions but they 
can be easily used as one-way functions when the input message length is set to 
be equal to the length of output. 

2.2 Hash Chains 

The idea of “hash chain” was first proposed by Lamport [5] in 1981 and suggested 
to be used for safeguarding against password eavesdropping. However being an 
elegant and versatile low-cost technique, the hash chain construction finds alot 
of other applications. A hash chain of length N is constructed by applying a 
one-way hash function h() recursively to an initial seed value (s). 

KN _ = h{h{h{...h{s)...))) 

N times 

The last element resembles the public key in public key cryptography 
i.e., by knowing can not be generated by those who does not know 

the value s. This property of hash chains has been directly evolved from the 
property of one-way hash functions. 

In most of the hash-chain applications, first is securely distributed and 
then the elements of the hash chain is spent one by one by starting from 
and continuing until the value of s is reached. At this point the hash chain has 
been exhausted and a new hash chain needs to be generated to proceed. 

2.3 One-Time Signatures 

The OTS concept was first proposed by Lamport, too. [6]. The idea behind 
OTS concept is again very easy to understand. A message sender prepares for 
a digital signature by generating a random number r, which is retained as the 
private value. He then securely distributes the hash of r, h{r), where ft- is a one- 
way function; this represents the public value and is used by receivers as the 
signature certificate to verify the signature. The signature is sent by distributing 
the value r itself. Receivers verify that this message could only be sent by the 
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sender by applying h to r to get h(r). If this matches the value for h{r) in the 
signature certificate, then the OTS is considered to be verified, since only the 
sender can know r. This, in effect, allows the signing of a predictable 1-bit value. 
In order to sign any 1-bit value, two random numbers (rl,r2) are needed; this 
way, both h{rl) and h{r2) are pre-distributed but at most one of (rl,r2) is 
revealed as a signature. In the original proposal [6], 160 random numbers out of 
320 are revealed as the signature of 160-bit hash value of any given message. 

Other than the computational efficiency mentioned previously, one-time sig- 
natures can also be regarded as a more secure solution since OTS is based only on 
OWF whereas public-key signatures are based on complex mathematical problem 
as well as OWF, using OTSs allow us to elliminate one more point of vulnera- 
bility. 

Despite these advantages provided by OTSs, they could not find practical 
usage in security world. We believe this is due to two main reasons: 

First of all, one-time signatures are longer than traditional signatures that 
might result in storage and bandwidth constraints. Recent studies succeeded in 
decreasing the length of one-time signatures in some extent. It was previously 
realized that p out of n random numbers are sufficient to sign a 6-bit length 
message if the following inequality holds for a given n and p [7] . 



2'>0(n,p) 



n! 

p\ ^ {n — p)\ 



( 1 ) 



To sign an arbitrary length message by OTS, just like the public-key based 
signatures, we can reduce the length of the message m by computing the hash 
value of the message, h{m) and then sign h(m). This means for instance for 6 = 
160 (e.g. SHS), n must be at least 165 with subsets of size 75 {p = 75). 

The extra length of one-time signatures (which was a concern two decades 
ago) is negligible today owing to the high speed of modern networks hence we 
think that the second disadvantage is a more serious one, that is one-time sig- 
natures can be used to sign only one message per one public key in its simple 
form. Since the public key requires to be distributed in a secure fashion which is 
done most typically using a public key signature, the benefit of using quick and 
efficient hash function is apparently lost. 

There is also a bunch of clever approaches to overcome this limitation. One of 
which is on-line/off-line signatures [8] where the public key of one-time signatures 
is signed by using public key techniques off-line before the message is known. 
When the message to be signed is in hand, there will not be any necessity to 
perform public key operation so that the response time (real-time efficiency) is 
improved. 

Due to constraints of mobile devices, we might want to minimize or eliminate 
the number of public key operations no matter it is off-line or not. Then Merkle’s 
proposal [9] can be preferred where one-time signatures can be embedded in a 
tree structure, allowing the cost of a single public key signature to be amortized 
over a multitude of OTS. The problem in this formulation is the longer lengths of 
signatures. Now we face a more severe storage and bandwidth requirement than 
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one-time signatures in its simple form since the length of signatures increases as 
the number of signatures generated using the same tree structure increase. 

3 Previous Studies 

Another promising approach to use one-time signatures more than once by using 
a single pre-distributed public key is the Server-Assisted One-Time Signature 
(SAOTS) protocol we propose in this paper which is based on a third party (the 
server). But before introducing it, in a more general view we would like to give 
a very short summary of the previous studies on employing a powerful server to 
decrease the computation requirements to generate digital signatures. 

Server assisted signatures can be explained in three subgroups depending 
on the trust relationship between the user and the server. More specifically the 
server employed may be either (1) fully trusted, (2) untrusted or (3) verifiable. 

In the first category, after receiving an authentic message from a user (A 
MAC algorithm which can be implemented very efficiently may be used for 
authentication), a more powerful proxy server on behalf of the user generates a 
public key digital signature for the message [10]. Notice that the user himself 
does not need to perform any public key operation, he just computes a MAC 
using secret key cryptography The drawback here is that this simple design 
is only applicable when the user fully trusts the proxy server i.e. the server can 
generate forged signatures and that cheating cannot be proven by the user. 

As the opposite, a totally untrusted server might be utilized i.e. the server 
only executes computations for the user. Now the goal of securely reducing com- 
putational costs on the user’s machine becomes more difficult to accomplish and 
in fact most of the schemes proposed so far have been found not to be secure. For 
instance the protocol proposed by Bequin and Quisquater [12] was later broken 
by Nguyen and Stern [13]. Up to our knowledge, for RSA signatures, designing 
a secure server-assisted protocol that utilizes an untrusted server is still an open 
problem. But the situation for DSA is not the same. A secure (unbroken) exam- 
ple for DSA is the interesting approach of Jakobson and Wetzel [14]. However 
we see that in their approach to generate the signature public key operations 
although in reduced amount are still needed to be performed on the constrained 
device. 

3.1 Verifiable- Server Assisted Signature Protocols 

The last server-assisted signature alternative is to employ a verifiable server 
(VS). A VS is the one whose cheating can be proven. This approach can be 
considered in somewhere between the other two since the server in this case can 
cheat but subsequently the user would have the ability to prove this situation 

^ In the literature, this method is sometimes called ’’proxy protocol with full dele- 
gation”. Proxy signatures have other variations mostly designed for the purpose of 
restricting the server’s signing rights. These variations are less-efficient than tradi- 
tional public key signatures hence are not of interest in our discussion [11]. 
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to other parties (e.g. an arbiter). We see that in some papers, VS is named as 
semi-trusted server. 

In traditional methods of digital signature generation, the signer usually ob- 
tains a public key certificate from a certification authority (CA) . In order to trust 
the legitimacy of signatures, the receiver must trust the CA’s certificate-issuance 
procedures. For instance the CA can issue a fake certificate for a particular user 
and then impersonate the user by generating a forged signature. However, if a 
contract between the CA and the user was signed in the certification process, 
in dispute the signer can prove the CA’s cheating by asking this contract from 
the CA. Notice the similarities between the trust relationship between the signer 
and CA in traditional methods and the signer and the server in verifiable-server 
assisted signature protocols. 

The first work that aims to reduce the computational costs to generate digital 
signatures for low-end devices by employing a powerful VS is SAS protocol [15]. 
In [16], the authors extend this work by providing implementation results as well 
as other details of the scheme. We now would like to provide a brief summary of 
SAS protocol. For a comprehensive treatment, please refer to the original papers 
[15,16]. 

There is an initialization phase in SAS where each user (originator) gets a 
certificate from an offline certification authority for (the last element of a 
hash chain of length N). In addition, each user should register to a VS (which 
has the traditional public-key based signing capability) before operation. Then 
the SAS protocol works in three rounds: 

1. The originator (O) sends m and AT* to VS' where 

— TO is the message 

— AT* is the element of the hash chain. The counter i is initially set to 
— 1 and decremented after each run. 

2. Having received O’s request, VS checks the followings: 

“ Whether O’s certificate is revoked or not 

~ Whether /i^“*(AT*) = or in a more efficient way h{K^) = AT*+^ since 
AT*+^ has already been received. 

If these checks are OK, V S signs to concatenated with AT* and sends it back 
to O. 

3. After receiving the signed message from VS, O verifies the V S’s signature, 
attaches AT*“^ to this message and sends it to the receiver R. 

Upon receipt of the signed message, the receiver verifies VS”s signature and 
checks whether h{K^~^) = AT*. 



^ Another important issue for server-assisted signature protocols is ’’revocation”, 
which is explored in Appendix A. 
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3.2 SAS Protocol Weaknesses 

We have observed that SAS protocol has several drawbacks. These are: 

1. Verifying VS’s signature: In step 3 of the SAS protocol, before sending the 
signed message to R, O should verify the V S’s signature otherwise an attack 
can be performed as follows: 

An attacker modifies the message that O sends to PS” and if PS' signs this 
new message instead, O’s revealing of without verifying VS’s signature 
results in a forged signature for the message the attacker has generated. 
Remember that for some constrained devices, public key cryptography is 
simply untenable no matter it is used for signing or verifying. 

Even when public key cryptography is acceptable for the user’s device, the 
efficiency provided by this protocol is based on an assumption which is not 
always valid. More precisely, if the VS uses RSA [1] signature scheme, where 
verification is much more efficient with respect to signing, SAS brings some 
efficiency. On the other hand, if instead of RSA, other digital signature 
schemes like DSS [2] where verification is at least as costly as signing are 
used to sign the message, SAS protocol apparently becomes less-efficient 
than traditional signing methods. 

2. Incompatible Verification: As stated in [16], unlike the proxy signatures ex- 
plained in section 3, SAS signatures are not compatible with other primary 
signature types. Therefore, the receiver must utilize the custom-built verifi- 
cation method of SAS protocol. 

3. Storing VS’s signatures: In SAS protocol, the signer must store VS’s signa- 
tures to prove its cheating [15]. For some devices having a limited storage 
capacity, this also might put a burden on the operation. 

4. Network Overhead: One of the factors that affects the overall performance 
of the SAS protocol is the round-trip delay between O and VS. To decrease 
the network delay, one can try to decrease the number of rounds from three 
to two in SAS (you can not do better than two rounds in a server assisted 
protocol). However, to reduce number of rounds to two, if O attaches the 
hash element to the first message he sends to VS, an attacker can forge 
a signed message easily by modifying the message while in transit. 

As a result, SAS protocol cannot be a two-round protocol like the SAOTS 
protocol that will be introduced in the next section this is basically because 
the signature is not binded with the message itself in a two-round case. 

4 Server Assisted One-Time Signature (SAOTS) Protocol 

In this section, we propose the server assisted one-time signature protocol 
(SAOTS) which is the first VS based approach where the user does not need 
to perform any public key operation at all. SAOTS is completely transparent to 
verifiers (the signatures are indistinguishable from standard signatures). More- 
over, in our proposed protocol unlike other alternatives the server not the user is 
required to save the signatures for dispute resolution. Operating in two rounds 
as opposed to three, SAOTS eliminates all the four aforementioned drawbacks 
of SAS protocol. 
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4.1 The Basic Idea 

Our protocol is built on top of one-time signature idea. Similar to proxy signa- 
tures where an efficient MAC algorithm is employed to establish an authentic 
channel between the user and the server, in SAOTS the user sends the message 
to the server after he signs it with a one-time signature. However this basic idea 
needs to be enhanced otherwise we face again with the inherent problem of OTS, 
signing only one message per one public key. We will explain how we have solved 
this problem in subsequent paragraphs. Our previous solution for this problem 
that gets benefit of the idea of hash chains is summarized in Appendix B [17]. 

4.2 Setup 

As a setup, every user registers to a server and generates a one-time private 
key (random numbers) and a one-time public key (hash values of these random 
numbers) and in a secure fashion he distributes the public key to the server. This 
can be accomplished by a public key signature if he has already a capability of 
traditional signing or he can directly get a certificate from a CA for the one-time 
public key he has. 

In addition, to produce public-key signatures, the server generates a private 
key on behalf of each registered user and obtains a certificate from a CA for the 
corresponding public key (in the certification process the user confirms that the 
public key belongs to himself). 



4.3 Operation 

The protocol works in two rounds as illustrated in Figure 1: 




Fig. 1. Operation of the server assisted one-time signature (SAOTS) protocol 
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1. The user precomputes a second one-time private key - public key pair. When 
the message to be signed is ready, he concatenates the message with the new 
one-time public key and signs this by his previous one-time private key. He 
then sends the message and the new one-time public key as well as the 
one-time signature to the server. 

2. Having already received securely the one-time public key of the user’s sig- 
nature on the message, the server verifies the one-time signature. He stores 
the new one-time public key the user has signed for the verification of next 
message. It is now ready to sign the message with the user’s private key (if 
his certificate is not revoked). Finally, the signed message is transmitted to 
the intended receiver(s). 

Since the signature is indistinguishable from a standard signature, receivers 
can transparently verify the signature by using the user’s public key One can 
easily prove that this protocol provides all the three security services asked from 
a digital signature but only if the server does not cheat i.e. it does not sign any 
message on behalf of the user without user’s approval. We will show in the next 
section how the user can prove the server’s cheating. If he cannot prove, the 
other parties conclude that the user is the one who actually signs the message. 

The user can sign any further messages easily by repeating the step 1. The 
server can always verify the one-time signature since it has securely received the 
one-time public key in the previous run of the protocol. The server must store 
all previous messages (as well as one-time signatures and one-time public keys) 
for secure operation but the user does not need to store anything to prove the 
server’s cheating. This becomes more clear when we make the security analysis 
in the next section. 

We would like to point that the ’’chaining” technique we use that attaches the 
one-time public key for the next message to the current message before signing 
is previously suggested by [19] in order to sign infinite length digital streams 
more efficiently. 

5 Security Analysis of SAOTS 

5.1 Security of Underlying Components 

For secure operation, we need to prove the security of signatures of both the 
user and the server. Since the server’s signature is a traditional one, we conclude 
that if the traditional signature algorithm used is a secure one, then the server’s 
signature is also secure. Secondly, we note that the security of the chaining 
technique used in the one-time signature the user generates has been studied 
previously. For the security proofs please refer to [19]. 

^ In some applications the server’s public key can be initially embedded in the veri- 
fication software and the receiver himself can not obtain securely the public key of 
all possible signers. Then it is better to have the server to sign the message with 
his own public key after appending a statement on the message saying that it has 
received it from the user [18]. Another advantage of this method is that the server 
avoids to obtain a new certificate for each registered user. 
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5.2 Dispute Resolution 

Provided that the underlying signatures building the SAOTS protocol are secure, 
we now want to show how a dispute can be resolved. In case of a dispute, the 
receiver submit the message and its public-key signature received from the server 
to an arbiter. The arbiter will verify the followings: 

— the public key of the user is certified by the CA. 

— the public key signature is valid. 

If these checks are successful, then the user is allowed to take the oppurtunity 
to repudiate the message. There will be two checks to decide whether the user’s 
claim is true or not: 

— CA is asked to prove that the user’s one-time public-key was registered. 

~ The server is asked to prove that the message was signed by the user’s one- 
time private key. 

As a proof, the server shows all the signed messages received from the user starts 
from the first one and continues until the message in dispute is reached. The 
arbiter verifies all these one-time signatures. If both CA and server successfully 
shows that they did not cheat, the arbiter concludes that the user is dishonest 
and claims falsely that he has not sent the message. 



5.3 Denial of Service Attacks 

In the SAS protocol [16], unlike traditional signature schemes, denial of service 
(DoS) attacks aiming to deny the server’s service to the users are of concern. 
The basic idea behind these attacks is simple, by sending legitimate (well-formed) 
requests, an adversary can force the server to perform alot of signing tasks so 
that it cannot response timely to real requests coming from registered users 
[16]. However if SAOTS is preferred, these attacks are eliminated because an 
adversary cannot forge users’ one-time signatures and therefore cannot generate 
legitimate requests. 



6 Performance Evaluation of SAOTS 

Table 1 shows the comparison of SAOTS and SAS protocols with respect to 
computation requirements on the participating entities. Note that, an efficient 
encoding method was presented in [7] which costs less than one hash operation. 
Encoding refers to computation of which subset of random numbers should be 
revealed as the OTS of the message. 

In SAOTS protocol, the server needs to perform one hash to get the hash of 
the message and 75 hash operations {p = 75 if SHS is used) to verify the OTS 
if all of 165 hash values constitute the public key. By a simple trick and with 
a cost of additional hash operation for the server we can reduce the length of 
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Table 1. Computational comparison of SAS and SAOTS protocols 





SAS 


SAOTS 


Originator 


IH + IV 


IH+IM 


Server 


2H + 1S 


{p + 2)H +1M + 1S 


Receiver 


1V + 2H 


IV + IH 



H: hash computation 

S: traditional signing by a public key 

V: verification of public key signature 

M: encoding computation (costs less than one hash) 

p: number of hash computations to verify OTS 



public key to a single hash value. The idea is simple: as the public key, calculate 
the hash of concatenation of all the 165 hashes. Now to be able to verify the 
OTS the sender should send the chosen 75 random numbers and the other 90 
random number’s hash value. In each run of the protocol the user should send 
one signature and one public key so if the length of random number is equal to 
the length of hash value, in overall the signer should send 75 + 90+1 = 166 hash 
values to the server. 

Despite this reduction in size, there is a trade-off between SAS and SAOTS 
protocols with respect to communication efficiency because the round efficiency 
of SAOTS protocol as explained in subsection 3.2 comes with an increase in 
the length of the message transmitted from the user to the server (the messages 
in SAS protocol are significantly smaller in size). In Appendix B, we present a 
variant of SAOTS where the length of messages exchanged are shortened. 

To have a more concrete comparison of SAS and SAOTS, we have imple- 
mented both of them using MIRACL library [20]. A PC running Windows 2000 
with an 800 MHz Pentium III and a 128 MB memory was chosen as the VS and 
a PC running Windows 95 with a 200 MHz Pentium I and a 32 MB memory was 
chosen as the users’ machine. Note that today’s high-end PDA’s and palmtops 
have a processor speed around 200 MHz. RSA with a 1024 bit key and SHS with 
a 160 bit output was used and m = 165, p = 75 were the OTS parameters. 



Table 2. Performance measurements of cryptography primitives (ms) 





Pentium III 800 Mhz 


Pentium I 200 Mhz 


SHS 


0.028 


0.156 


RSA (verifying) 


2.220 


13.893 


RSA(signing) 


9.454 


59.162 


Encoding 


0.02 


0.1 
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Table 2 gives the performance measurements of cryptography primitives on 
two platforms used and Table 3 summarizes our findings of the experiments. 
From these tables, we conclude that SAOTS is a more efficient protocol with 
respect to computational requirements for ordinary users. In our opinion, the 
increase in server’s computation is not a big problem since in practice a much 
more powerful machine is usually employed as the server. 

Table 3. Experimental comparison of SAS and SAOTS protocols (ms) 





SAS 


SAOTS 


Originator’s computation 


14.049 


0.256 


Server’s computation 


9.482 


11.650 


Receiver’s computation 


14.205 


14.049 



7 Conclusion 

To generate signatures, getting help from a verifiable-server has an advantage 
over proxy-based solutions since as opposed to proxy-server, verifiable-server’s 
cheating can be proven. Verifiable-server assisted signatures were proposed in 
the past but they could not eliminate public key operations for the signer. In 
this paper, we propose a new alternative called SAOTS (server assisted one-time 
signature) where just like proxy signatures generating a public key signature is 
possible without performing any public key operations at all. 

Verification transparency, no necessity to store past signatures to prove 
server’s cheating, reduced number of rounds are other advantages of SAOTS. 
The only drawback of the proposed protocol is the increased length of the mes- 
sage transmitted from the user to the server’s machine. 
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Appendix A: Revocation in SAOTS Protocol 

As far as digital signatures are of concern, revocation means that if a user does 
something that warrants revocation of his security privileges i.e. he might be 
fired or may suspect that his private key has been compromised, he should not 
generate valid digital signatures on any further messages. However, signatures 
generated prior to revocation may need to remain valid. 

In Online Certificate Status Protocol (OCSP) [21], today’s state-of-the-art 
approach to solve the revocation problem, to provide timely revocation infor- 
mation, upon verifier’s query a validation server sends back a signed response 
showing the sender’s certificate’s current status. The drawback here is that it is 
impossible to ask a validation server whether a certificate was valid at the time 
of signing. Immediate revocation (the user cannot sign immediately after the re- 
vocation takes place) is possible if an online VS is employed. In order to revoke 
a user’s public key, it is sufficient to notify the server. The server maintains a 
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list of revoked users and it rejects signing on behalf of the user if his public key 
is in the list. 

We now want to show a deficiency in the revocation capability of SAS protocol 
[16] that works in three rounds. Think of a situation where the user gets the 
public key signature from the VS in round 2 and postpones the execution of 
round 3. He then notifies the server to revoke his public key (e.g. by claiming 
that his private key has been stolen). Afterwards, he can cheat by executing 
round 3 and generating a valid signature although his public key has already 
been revoked. In SAOTS, this deficiency is eliminated since the protocol works 
in two steps in opposed to three. 



Appendix B: SAOTS with Hash Chains 

In SAOTS protocol, the message user sends to the server is composed of the 
message to be signed, its one-time signature and the public key for the next 
signature. We now show a variant of SAOTS to reduce the size of this message 
[17]. But as we will see, this reduction is possible only with a cost of off-line 
verification of server’s signature. Now in the initialization phase of SAOTS, each 
user gets a certificate from a CA for the hash array of length n: 



K, 



rN rN 



. ■ ( 2 ) 
Where n is chosen to be large enough to encode the hashed message (n=165 
for SHS). Each element of the array is the last element of a hash chain of length 
N. Then SAOTS with hash chains works in two rounds again but now the 
originator should receive, verify and store the signature coming from the server: 



1. The originator (O) sends m and to V S where 

— TO is the message 

~ S'* = (AT*^ , , AT*^ , . . . , AT*^) denotes the subset of the array that en- 

codes the message to an OTS (composed of elements of hash chains) . 
The counter i is initially set to — 1 and decremented after each run. 

2. Having received O’s request, VS performs the followings: 

— Whether O’s certificate is revoked or not. 

— Computes the encoding of h{m) or in other words finds out which subset 
S* would correspond to the OTS of the message. 

— Checks whether for (g = 1 to p) A^“*(A1*^) = AT^ or in a more efficient 
way (<; = 1 to p) h{KlJ = if AT*+^ has already been received. 

If these are all OK, VS signs the message and sends it back to both R and 
O. For secure operation, O should sign the next message only after the (off- 
line) verification of VS’s signature otherwise the protocol can be broken. We 
recommend interested readers to [17] to see how the attacks work and how they 
can be avoided by the verification of server’s signature. 

In summary, with a cost of heavier computation requirement and an extra 
storage requirement, SAOTS with hash chains provides a total length save of 
(n -I- 1 — _p) hashes. 
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Abstract. We present a cryptanalysis of a zero-knowledge identi- 
fication protocol introduced by Naccache et al. at Eurocrypt ’95. 
Our cryptanalysis enables a polynomial-time attacker to pass the iden- 
tification protocol with probability one, without knowing the private key. 

Keywords: Zero-knowledge, Fiat-Shamir Identification Protocol. 



1 Introduction 

An identification protocol enables a verifier to check that a prover knows the 
private key corresponding to a public key associated to its identity. A protocol 
is zero-knowledge when the only additional information obtained by the veri- 
fier is that the prover knows the corresponding private key [2]. A famous zero- 
knowledge identification protocol is Fiat-Shamir’s protocol [1], which is provably 
secure assuming that factoring is hard. The protocol requires performing multi- 
plications modulo an RSA modulus. 

A space-efficient variant of the Fiat-Shamir identification protocol was intro- 
duced by Naccache [3] and by Shamir [5] at Eurocrypt’ 94. This variant requires 
only a few bytes of RAM, even for an RSA modulus of several thousands bits, 
and is provably as secure as the original Fiat-Shamir protocol. This variant is 
particularly interesting when the prover is implemented in a smart-card, in which 
the amount of RAM is very limited. 

However, the time complexity of the previous variant is still quadratic in the 
modulus size, and its implementation on a low-cost smart-card is likely to be 
inefficient. At Eurocrypt ’95, Naccache et al. introduced another Fiat-Shamir 
variant [4]. It uses the same idea for reducing the space-complexity, but the 
prover’s time complexity is now quasi-linear in the modulus size (instead of being 
quadratic). As shown in [4], the new identification protocol can be executed on 
a low-cost smart-card in less than a second. 

In this paper, we describe a cryptanalysis of one of [4]’s time-efficient variants. 
Our cryptanalysis enables a polynomial-time attacker to pass the identification 
protocol with probability one, without knowing the private key. We would like to 
stress that the basic quasi-linear time protocol introduced by [4] remains secure, 
since it is in fact equivalent to standard Fiat-Shamir and hence to factoring. 
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2 The Fiat-Shamir Protocol 

We briefly recall Fiat-Shamir’s identiflcation protocol [1]. The objective of the 
prover is to identify itself to any verifier, by proving knowledge of a secret s 
corresponding to a public value v, which is associated to its identity. The protocol 
is zero-knowledge in that it does not reveal any additional information about s 
to the verifier. The security relies on the hardness of factoring an RSA modulus. 

Key Generation: The authority generates a fc-bit RSA modulus n = p ■ q, 
and an integer v which is a function of the identity of the prover. Using the 
factorization of n, it computes a square root s of w modulo n, i.e. v = mod n. 
The authority publishes (n,v) and sends s to the prover. 

Identification Protocol: 

1. The prover generates a random x Z„, and sends z = x'^ mod n to the 
verifier. 

2. The verifier sends a random bit b to the prover. 

3. If 6 = 0, the prover sends y = x to the verifier, otherwise it sends y = x ■ s 
mod n. 

4. The verifier checks that y“^ = z ■ mod n. 

5. Steps 1-4 are repeated several time to reduce the cheating probability. 

3 The Space-Efficient Variant of Fiat-Shamir’s Protocol 

Fiat-Shamir’s protocol requires to perform multiplications modulo an RSA mod- 
ulus n. It has a quadratic time and linear space complexity. Therefore, the orig- 
inal protocol could not be implemented on low-cost smart-cards, which in 1994 
contained about 40 bytes of random access memory (RAM). Naccache [3] and 
Shamir [5] introduced a space-efficient variant which requires only a few bytes of 
RAM, even for an RSA modulus of several thousands bits, and which is provably 
as secure as the original Fiat-Shamir protocol. 

The idea is the following: assume that the prover is required to compute 
z = X ■ y mod n, where x and y are two large numbers which are already stored 
in the smart-card (e.g., in its EEPROM^), or whose bytes can be generated on 
the fly. Then instead of computing z = x ■ y mod n, the prover computes 

z' = X ■ y + r ■ n 

for a random r uniformly distributed in [0, B], for a fixed bound B. The verifier 
can recover x-y mod n by reducing z' modulo n. Moreover, when computing z' , 
the prover does not need to store the intermediate result in RAM. Instead, the 

^ The smart-card EEPROM is a re-writable memory, but the operation of writing is 
abont one thousand time slower than writing into RAM, and can not be used to 
store fast-changing intermediate data dnring the execntion of an algorithm. 
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successive bytes of z' can be sent out of the card as soon as they are generated. 
Therefore, a smart-card implementation of the prover needs only a few bytes of 
RAM (see [5] or [3] for more details). 

As shown in [5], if B is sufficiently large, there is no loss of security in sending 
z' instead of z. Namely, from z one can generate z" = z + u ■ n where m is a 
random integer in [0, B], Letting z = x ■ y — ui ■ n, we have: 

z” = X ■ y + {u — uj) ■ n 

Then, the statistical distance between the distributions induced by z' and z" is 
equal to the statistical distance between the uniform distribution in [0, B] and 
the uniform distribution in [—oj, B — uf\, which is equal to oj / B. Then, assuming 
that X and y are both in [0,n], this gives oj £ [0,n], and the previous statistical 
distance is lesser than n/B. Therefore, by taking a B much larger than n (for 
example, B = 2^+®°, where k is the bit-size of n), the two distributions are 
statistically indistinguishable, and any attack against the protocol using z' would 
be as successful against the protocol using z. 

The identification protocol is then modified as follows: 

Space-EfRcient Fiat-Shamir Identification Protocol: 

1. The prover generates a random a; •<— and a random r £ [0, B], and sends 

z = -|- r • n to the verifier. 

2. The verifier sends a random bit b to the prover. 

3. If 6 = 0, the prover sends y = x to the verifier, otherwise it sends y = x-s+t-n 
for a random t £ [0, R]. 

4. The verifier checks that y'^ = z ■ mod n. 

5. Steps 1-4 are repeated several time to reduce the cheating probability. 

4 The Time-EfRcient Variant of Fiat-Shamir’s Protocol 

The time complexity of the previous variant is still quadratic in the modulus size, 
and its implementation on a low-cost smart-card is likely to be inefficient. At 
Eurocrypt ’95, Naccache et al. introduced yet another Fiat-Shamir variant [4]. 
It uses the same idea as Shamir’s variant for reducing the space-complexity, but 
the prover’s time complexity is now quasi-linear in the modulus size (instead 
of being quadratic). As shown in [4], the identification protocol can then be 
executed on a low-cost smart-card in less than a second. 

The technique consists in representing the integers modulo a set of £ small 
primes pi (usually, one takes the first £ primes) . This is called the Residue Num- 
ber System (RNS) representation. Letting 7T = ni=i Po by virtue of the Chinese 
Remainder Theorem, any integer 0 < x < 7T is uniquely represented by the vec- 
tor: 

(x mod pi, . . . , X mod pi) 

The advantage of this representation is that multiplication is of quasi-linear 
complexity (instead of quadratic complexity): if x and y are represented by the 
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vectors {x\, . . . , X() and (j/i, . . . , yi), then the product z = x ■ y is represented 
by: 

{xi-yi raod pi,. . . ,Xf y( mod 

The size i of the RNS representation is determined so that all integers used in 
the protocol are strictly smaller than 77; the bijection between an integer and its 
modular representation is then guaranteed by the Chinese Remainder Theorem. 
The time-efficient variant of the Fiat-Shamir protocol is the following: 

Time-EfRcient Variant of the Fiat-Shamir Protocol: 

1. The prover generates a random x G [0, n] and a random r G [0, 77], and sends 
z = x"^ + r - nto the verifier. The integers x, r and z are represented in RNS. 

2. The verifier sends a random bit b to the prover. 

3. If 6 = 0, the prover sends y = x to the verifier, otherwise it sends y = x-s+t-n 
for a random t G [0,77]. The integers x, s and t are represented in RNS. 

4. The verifier checks that y'^ = z ■ mod n. 

5. Steps 1-4 are repeated several time to reduce the cheating probability. 

The only difference between this time-efficient variant and Shamir’s space- 
efficient variant is that integers are represented in RNS. Therefore, from a secu- 
rity standpoint, those variants are strictly equivalent. 

However, another time-efficient variant is introduced in [4], whose goal is 
to increase the efficiency of the verifier. The goal of this second variant is to 
enable the verifier to check the prover’s answer in linear time when 6 = 0. In this 
variant, when b = 0, the prover also reveals r, which enables the verifier to check 
that z = x'^ + r ■ n by performing the computation in the RNS representation 
(the equality z = x'^ + r-n is checked modulo each of the primes pi), which takes 
quasi-linear time instead of quadratic time. More precisely, this variant is the 
following: 

Second Time-EfRcient Variant of the Fiat-Shamir Protocol: 

1. The prover generates a random x G [0, n] and a random r G [0, 77], and sends 
z = x^ -|- r • n to the verifier. The integers x, r and z are represented in RNS. 

2. The verifier sends a random bit b to the prover. 

3. If 6 = 0, the prover sends x and r to the verifier, in RNS representation. If 
b = 1, the prover sends y = x- s-|-t-nfora random t G [0, 77], where y is 
represented in RNS. 

4. If 6 = 0, the verifier checks that z = x'^ + r ■ n. The test is performed in the 
RNS representation. If & = 1, the verifier checks that y'^ = z ■ v mod n. 

5. Steps 1-4 are repeated several time to reduce the cheating probability. 

This second time-efficient variant is more efficient for the verifier, because 
when 6 = 0, the check at step 3 is performed in RNS representation, which is 
of quasi-linear complexity instead of quadratic complexity. Therefore, the time- 
complexity of this second time-efficient variant is expected to be divided by a 
factor of approximately two. 
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5 Cryptanalysis of the Second Time-EfRcient Variant of 
Eurocrypt ’95 

We show that the second time-efficient variant is insecure. We describe an at- 
tacker A that passes the identification protocol with probability one, without 
knowing the private key s. 

The key observation is the following: since for b = 0, the verifier checks that 
z = + r ■ n in the RNS representation, the equality checked by the verifier is 

actually: 



z = x^ + r ■ n mod IT (1) 

Since the attacker can choose x,r € [0, 77] instead of x G [0, n] and r G [0, B], we 
may have x“^ + r ■ n > II, and therefore equation (1) does not necessarily imply 
that z = x'^ + r ■ n holds over the integers (or equivalently, that x is a square 
root of 0 modulo n) . Therefore the zero-knowledge security proof does not apply 
anymore, which leads to the following attack: 

Since 77 is the product of small primes, it is easy to compute square roots 
modulo 77, as opposed to computing square roots modulo n. Therefore, the 
attacker can generate an integer z at step 1 so that he is guaranteed to succeed 
if 6 = 1. Then if 6 = 0, the attacker will also succeed by computing a square 
root modulo 77, which is easy. 

More precisely, at step 1, the attacker generates a random m G Z„ and a 
random r' G [0, 77], and sends z = fv mod n) -I- r' • n to the verifier. Then at 
step 3, if 6 = 0, the attacker generates a random r G [0, 77], and solves: 

x^ = z — r ■ n mod 77 

Since 77 is the product of small primes, it suffices to take a square root of 
z — r ■ n modulo each of the small primes pi. If z — r • n is not a square modulo 
a given prime pj, it suffices to modify the value of r mod pj without changing 
r mod Pi for i ^ j. This is possible since from the protocol, r is not required 
to belong to [0, 77]. Eventually the attacker sends x and r to the verifier in RNS 
representation, and the attacker is successful with probability one. 

Otherwise, if 6 = 1, then the attacker sends y = u + t ■ n for a random 
t G [0,77], and the verifier can check that y'^ = z ■ v mod n since = z ■ v 
mod n. 

Therefore, in both cases, the attacker passes the identification protocol with 
probability one, without knowing the private key. 

6 Conclusion 

We have shown that one of the time-efficient Fiat-Shamir variants introduced at 
Eurocrypt’ 95 by Naccache et al is insecure. Namely, a polynomial-time attacker 
can pass the identification protocol with probability one, without knowing the 
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private key. Consequently, for practical implementations, we recommend to use 
[4]’s first time-efficient variant rather than [4]’s second time-efficient variant, 
which should be avoided. We believe that our attack illustrates the importance of 
careful security analysis of even apparently harmless variations of known secure 
protocols 
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Abstract. We introduce a new cryptographic technique that we call 
universal re- encryption. A conventional cryptosystem that permits re- 
encryption, such as ElGamal, does so only for a player with knowledge of 
the public key corresponding to a given ciphertext. In contrast, universal 
re-encryption can be done without knowledge of public keys. We propose 
an asymmetric cryptosystem with universal re-encryption that is half as 
efficient as standard ElGamal in terms of computation and storage. 
While technically and conceptually simple, universal re-encryption leads 
to new types of functionality in mixnet architectures. Gonventional 
mixnets are often called upon to enable players to communicate with one 
another through channels that are externally anonymous, i.e., that hide 
information permitting traffic-analysis. Universal re-encryption lets us 
construct a mixnet of this kind in which servers hold no public or private 
keying material, and may therefore dispense with the cumbersome 
requirements of key generation, key distribution, and private-key man- 
agement. We describe two practical mixnet constructions, one involving 
asymmetric input ciphertexts, and another with hybrid-ciphertext 
inputs. 

Keywords: anonymity, mix networks, private channels, universal re- 
encryption 



1 Introduction 

A mix network or mixnet is a cryptographic construction that invokes a set of 
servers to establish private communication channels [3]. One type of mix net- 
work accepts as input a collection of ciphertexts, and outputs the corresponding 
plaintexts in a randomly permuted order. The main privacy property desired of 
such a mixnet is that the permutation matching inputs to outputs should be 
known only to the mixnet, and no one else. In particular, an adversary should 
be unable to guess which input ciphertext corresponds to an output plaintext 
any more effectively than by guessing at random. 

One common variety of mixnet known as a re-encryption mixnet relies on a 
public-key encryption scheme, such as ElGamal [7] , that allows for re-encryption 
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of ciphertexts. For a given public key, a ciphertext C is said to represent a 
re-encryption of C if both ciphertexts decrypt to the same plaintext. In a re- 
encryption mixnet, the inputs are submitted encrypted under the public- key of 
the mixnet. (The corresponding private key is held in distributed form among 
the servers.) The batch of input ciphertexts is processed sequentially by each 
mix server. The first server takes the set of input ciphertexts, re-encrypts them, 
and outputs the re-encrypted ciphertexts in a random order. Each server in 
turn takes the set of ciphertexts output by the previous server, and re-encrypts 
and mixes them. The set of ciphertexts produced by the last server may be 
decrypted by a quorum of mix servers to yield plaintext outputs. Privacy in 
this mixnet construction derives from the fact that the ciphertext pair (C, C) is 
indistinguishable from a pair (C, R) for a random ciphertext R to any adversary 
without knowledge of the private key. 

In this paper, we propose a new type of public-key cryptosystem that permits 
universal re- encryption of ciphertexts. We introduce the term universal encryp- 
tion to mean re-encryption without knowledge of the public key under which a 
ciphertext was computed. Like standard re-encryption, universal re-encryption 
transforms a ciphertext C into a new ciphertext C with same corresponding 
plaintext. The novelty in our proposal is that re-encryption neither requires nor 
yields knowledge of the public key under which a ciphertext was computed. 
(George Danezis independently discovered the same essential concept.) 

When applied to mix networks, our universal re-encryption technique offers 
new and interesting functionality. Most importantly, mix networks based on 
universal re-encryption dispense with the cumbersome protocols that traditional 
mixnets require in order to establish and maintain a shared private key. We 
discuss more benefits and applications of universal mixnets in the next section. 
We construct a universal mixnet based on universal re-encryption roughly as 
follows. Every input to the mixnet is encrypted under the public key of the 
recipient for whom it is intended. Thus, unlike standard re-encryption mixnets, 
universal mixnets accept ciphertexts encrypted under the individual public 
keys of receivers, rather than encrypted under the unique public key of the 
mix network. These ciphertexts are universally re-encrypted and mixed by each 
server. The output of a universal mixnet is a set of ciphertexts. Recipients 
can retrieve from the set of output ciphertexts those addressed to them, and 
decrypt them. 

Organization. The rest of the paper is organized as follows. In the next section, 
we give an overview of the main properties that distinguish universal mixnets 
from standard mixnets, and give one example of a new application made possible 
by universal mixnets. This is followed in section 3 by a formal definition of 
semantic security for universal re-encryption, as well as a proposal for creating 
a public-key cryptosystem with universal re-encryption based on ElGamal. In 
section 4, we describe our construction for an asymmetric universal mixnet. We 
define and prove the security properties of our system in section 5. In section 6, 
we propose a hybrid variant of our universal mixnet construction that combines 
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public-key and symmetric encryption to handle long messages efficiently. We 
conclude in section 7. 

2 Universal Mixnets: Properties and Applications 

To motivate the constructions of this paper, we list here some of the main 
properties that set apart universal mixnets from traditional re-encryption 
mixnets. We also give one example of a new application made possible by 
universal mixnets: Anonymization of RFID tags. 

Universal mixnets hold no keying material. A universal mixnet operates 
without a monolithic public key and thus dispenses at the server level with 
the complexities of key generation, key distribution, and key maintenance. 
This allows a universal mixnet to be set up more efficiently and with greater 
flexibility than a traditional re-encryption mixnet. A universal mixnet can be 
rapidly re-configured: Servers can enter and leave arbitrarily, even in the middle 
of a round of processing, without going through any setup. A mix server that 
crashes or otherwise disappears in the midst of the mixing process can thus be 
easily replaced by another server. 

Universal mixnets guarantee forward anonymity. The absence of shared 
keys means that universal mixnets offer perfect forward-anonymity. Even if all 
mix servers become corrupted, the anonymity of previously mixed batches is 
preserved (provided that servers do not store the permutations or re-encryption 
factors they used to process their inputs). In contrast, if the keying material of 
a standard mix is revealed, an adversary with transcripts from previous mix 
sessions can compromise the privacy of users. 

Universal mixnets do not support escrow capability. The flip-side 
of perfect forward-anonymity is that is that it is not possible to escrow the 
privacy offered by a universal mixnet in a straightforward fashion. Escrow 
is only achievable in a universal mix as long as every server involved in the 
mixing remembers how it permuted its inputs and is willing to reveal that 
permutation. This may be a drawback from the perspective of law enforcement. 
In comparison, escrow is possible in a traditional mix, provided that the shared 
key can be reconstructed. This requires the participation of only a quorum of 
servers, not all of them. 

Efficiency. We present in this paper a public-key cryptosystem with universal 
re-encryption that is half as efficient as standard ElGamal: It requires exactly 
twice as much storage, and also twice as much computation for encryption, re- 
encryption, and decryption. In this regard, the universal mixnet constructions 
we propose in this paper are practical. The drawback of a universal mixnet, as 
we discuss in detail below, is that receivers must attempt to decrypt all output 
items in order to identify the messages intended for them. 
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2.1 Anonymizing RFID Tags 

An interesting new application made possible by universal mixnets is the 
anonymization of radio-frequency identification (RFID) tags. An RFID tag is 
a small device that is used to locate and identify physical objects. RFID tags 
have very limited processing ability (insufficient to perform any re-encryption 
of data), but they allow devices to read and write to their memory [15,16]. 
Communication with RFID tags is performed by means of radio, and the tags 
themselves often obtain power by induction. Examples of uses of RFID tags 
include the theft-detection tags attached to consumer items in stores and the 
plaques mounted on car windshields for automated toll payment. Due to the 
projected decrease in the cost of RFID tags, their use is likely to extend in the 
near future to a wide range of general consumer items, including possibly even 
banknotes [20,12]. 

This raises concerns of an emerging privacy threat. Most RFID tags emit 
static identifiers. Thus, an adversary with control of a large base of readers for 
RFID tags may be able to track the movement of any object in which an RFID 
tag is embedded, and hence learn the whereabouts of the owner of that object. 
In order to prevent tracking of RFID tags, one could let some set of (honest- 
but-curious) servers perform re-encryption of the information that is publicly 
readable from RFID tags. The resulting system is similar to a mixnet, in which 
the permutation of ciphertexts is replaced by the movement of the RFID tags. 

A traditional mix network, however, only partially solves the problem of 
tracking. The difficulty is that the data contained in different RFID tags may be 
encrypted under different public keys, depending on who possesses the authority 
to access that data. For example, while the data contained in tags used for 
automated toll payment may be encrypted under the public key of the transit 
agency, the data contained in tags attached to merchandise in a department 
store may be encrypted under the public key of that department store. To re- 
encrypt RFID tag data, a traditional mix network would need knowledge of the 
key under which that data was encrypted. The public key associated with an 
RFID tag could be made readable, but then the public key itself becomes an 
identifier permitting a certain degree of tracking. This is particularly the case if 
a user carries a collection of tags, and may therefore be identified by means of a 
constellation of public keys. 

Universal mixnets offer a means of addressing the problem of RFID-tag pri- 
vacy. If the data contained in RFID tags is encrypted with a cryptosystem that 
permits universal re-encryption, then this data can be re-encrypted without 
knowledge of the public-key. Thus universal re-encryption may offer heightened 
privacy in this setting by permitting agents to perform re-encryption without 
knowledge of public keys. While there have been previous designs using mixes 
for the purposes of privacy protection for low-power devices (e.g., [14]), universal 
re-encryption permits significant protocol and management simplification. 
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3 Universal Re-encryption 

A conventional randomized public-key cryptosystem is a triple of algorithms, 
CS = (KG, E, D), for key generation, encryption, and decryption. We assume, as 
is often the case for discrete-log-based cryptosystems, that system parameters 
and underlying algebraic structures for C S are published in advance by a trusted 
party. These are generated according to a common security parameter k. System 
parameters include or imply specifications of M, C, and R, a message space, 
ciphertext space, and set of encryption factors respectively. In more detail: 

~ The key-generation algorithm {PK, SK) KG outputs a random key pair. 

— The encryption algorithm C •<— E(m, r, PK) is a deterministic algorithm 
that takes as input a message m G M, an encryption factor r G R and a 
public key PK, and outputs a ciphertext C G C. 

— The decryption algorithm m G- D{SK, C) takes as input a private key SK 
and ciphertext C G C and outputs the corresponding plaintext. 

A critical security property for providing privacy in a mix network is that of 
semantic security. Loosely speaking, this property stipulates the infeasibility of 
learning any information about a plaintext from a corresponding ciphertext [8] . 
For a more formal definition, we consider an adversary that is given a public key 
PK, where (PK,SK) G- KG. This adversary chooses a pair {mo, mi) of plain- 
texts. Corresponding ciphertexts (Co,Ci) = {E{mo,ro,PK),E{mi,ri,PK)) for 
€[/ R- are computed, where Gu denotes uniform, random selection. For a 
random bit b, the adversary is given the pair {Cb, Ci-t), and tries to guess b. The 
cryptosystem CS is said to be semantically secure if the adversary can guess b 
with advantage at most negligible in k, i.e. with probability at most negligibly 
larger than 1/2. 

For a re-encryption mix network, an additional component known as a re- 
encryption algorithm, denoted by Re, is required in CS. This algorithm re- 
randomizes the encryption factor in a ciphertext. In a standard cryptosystem, 
this means that C G- Re(C, r, PK) for C,C G C,r G R, and a public key PK. 
Observe that re-encryption, in contrast to encryption, may be executed with- 
out knowledge of a plaintext. The notion of semantic security may be naturally 
extended to apply to the re-encryption operation by considering an adversary 
that chooses ciphertexts {Cq,Ci) under PK. The property of semantic security 
under re- encryption, then, means the following: Given respective re-encryptions 
(C^, C{_^) in a random order, the adversary cannot guess b with non-negligible 
advantage in k. Provided that Re yields the same distribution of ciphertexts as 
E (given r G[/ R) or that the two distributions are indistinguishable, it may be 
seen that basic semantic security implies semantic security under re-encryption. 

Bellare et al. [1] define another useful property possessed by the ElGamal 
cryptosystem. Known as “key-privacy,” this property may be loosely stated as 
follows. Given a ciphertext encrypted under a public key randomly selected from 
a published pair {PKq, PKi), an adversary cannot determine which key corre- 
sponds to the ciphertext with non-negligible advantage. Key-privacy is one fea- 
ture of the security property we develop in this paper for universal re-encryption. 
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As already explained, a universal cryptosystem permits re-encryption with- 
out knowledge of the public key corresponding to a given ciphertext. Let us de- 
note such a cryptosystem by UCS = (UKG, UE, URe, UD), where UKG, UE, and 
UD are key generation, encryption, and decryption algorithms. These are defined 
as in a standard cryptosystem. The difference between a universal cryptosystem 
UCS and a standard cryptosystem resides in the re-encryption algorithm URe. 
The algorithm URe takes as input a ciphertext C and re-encryption factor r, but 
no public key PK. Thus, we have C' ^ URe(C, r) for C,C' G C, r G R. 

To define universal semantic security under re- encryption, i.e., with respect 
to URe, it is necessary to consider an adversarial experiment that is a variant on 
the standard one for semantic security. We define an experiment uss as follows for 
a (stateful) adversarial algorithm A. This experiment terminates on issuing an 
output bit. As above, we assume an appropriate implicit parametrization of UCS 
under security parameter k. The idea behind the experiment is as follows. The 
adversary is permitted to construct universal ciphertexts under two randomly 
generated keys, PKq and PKi. These ciphertexts are then re-encrypted. The aim 
of the adversary is to distinguish between the two re-encryptions. The adversary 
should be unable to do so with non-negligible advantage. 

Experiment (UCS,k) 

PKo G- UKG; PKi g- UKG; 

(toq, mi, ro, ri) ^ A{PKq, PKi, “specify ciphertexts”); 
if mo, mi ^ M or ro, ri ^ R then 
output ‘O’; 

Co G- UE{mo,ro,PKo);Ci g- UE(mi, ri, PATi); 
r(),r'i Gu R; 

C' ^ URe(Co,r');C( ^ URe(Ci,r'i); 
b Gu {0, 1}; 

b' G- “guess”); 

if b = b' then output ‘1’ else output ’0’ 

We say that UCS is semantically secure under re-encryption if for any adversary 
A with resources polynomial in k, the probability pr[Exp)^'’®([/C'S', k) = ‘1’] — 1/2 
is negligible in k. 

The experiment uss captures the idea that the keys associated with cipher- 
texts are concealed by the re-encryption process in UCS. Thus, even an adver- 
sary who can compose the ciphertexts undergoing re-encryption cannot make use 
of differences in public keys to defeat the semantic security of the cryptosystem. 



3.1 Universal Re-encryption Based on ElGamal 

We present a public-key cryptosystem with universal re-encryption that may be 
based on the ElGamal cryptosystem implemented over any suitable algebraic 
group. The basic idea is simple: We append to a standard ElGamal ciphertext 
a second ciphertext on the identity element. By exploiting the algebraic homo- 
morphism of ElGamal, we can use the second ciphertext to alter the encryption 
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factor in the first ciphertext. As a result, we can dispense with knowledge of the 
public key in the re-encryption operation. 

Let E[m] loosely denote the ElGamal encryption of a plaintext m (under 
some key). In a universal cryptosystem, a ciphertexts on message m consists of 
a pair A[l]]. ElGamal possesses a homomorphic property, namely that 

E[a] X E[6] = E[ab] for group operator x. Thanks to this property, the second 
component can be used to re-encrypt the first without knowledge of the asso- 
ciated public key. To provide more detail, let Q denote the underlying group 
for the ElGamal cryptosystem; let q denote the order of Q. (Here the security 
parameter k is implicit in the choice of C/.) Let g be a published generator for Q. 
The universal cryptosystem is as follows. Note that we assume random selection 
of encryption and re-encryption factors in this description. 



— Key generation (UKG): Output (PK,SK) = [y = g^,x) for x Gu Zq. 

— Encryption (UE): Input comprises a message m, a public key y, and a 
random encryption factor r = (ko,ki) € Z^. The output is a ciphertext 
C = [(ao,/?o); (ai,/3i)] = [{my'^\g’^°)-, {y^^,g^^)]. We write C = \}£pK{m,r) 
or C = UEp/y(m) for brevity. 

— Decryption (UD): Input is a ciphertext C = [(ao> /3o); («!, /^i)] under public 

key y. Verify oq; / 3o) Q^ij /?i G Q] if not, the decryption fails, and a special 
symbol T is output. Gompute mo = and mi = ai/Pf. If mi = 1, 

then the output is m = mg. Otherwise, the decryption fails, and a special 
symbol T is output. Note that this ensures a binding between ciphertexts 
and keys: a given ciphertext can be decrypted only under one given key. 

— Re-encryption (URe): Input is a ciphertext C = [(oq, /? o); (^ij /?i)] with 
a random re-encryption factor r' = (fcQ,fc() G Z^. Output is a ciphertext 



C' = [(a'o>/3o); («'d/3'i)] = [{oioa\° , pIP], where k'^,k[ Gp Z,. 



Observe that the ciphertext size and the computational costs for all algorithms 
are exactly twice those of the basic ElGamal cryptosystem. The properties 
of standard semantic security and also universal semantic security under re- 
encryption (as characterized by experiment uss) may be shown straightforwardly 
to be reducible to the Decision Diffie-Hellman (DDH) assumption [2] over the 
group Q, in much the same way as the semantic security of ElGamal [19]. Thus, 
one possible choice of Q is the subgroup of order q of Z*, where p and q are 
primes such that q\p — 1. Throughout the remainder of the paper, we work 
with the ElGamal implementation of universal re-encryption, and let g denote 
a published generator for the choice of underlying group Q. 



4 Universal Mix Network Construction 

We use the following scenario to introduce our universal mixnet construction. 
We consider a number of senders who wish to send messages to recipients in 
such a way that the communication is concealed from everyone but the sender 
and recipient themselves. In other words, we wish to establish channels between 
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senders and receivers that are externally anonymous. We assume that every 
recipient has an ElGamal private/public key pair (x,y = g^) in some published 
group Q. We also assume that every sender knows the public key of all the 
receivers with whom she intends to communicate. (Alternatively, the sender 
may have a “blank” ciphertext for this party. By this we mean an encryption 
using UE of the identity element in Q under the public key of the recipient. 
A “blank” may be filled in without knowledge of the corresponding public key 
through exploitation of the underlying algebraic homomorphism in ElGamal.) 
The communication protocol proceeds as follows: 

1. Submission of inputs. Senders post to a bulletin board messages that 
are universally encrypted under the public key of the recipient for whom 
they are intended. Every entry on the bulletin board thus consists of a pair 
of ElGamal ciphertexts {E[m\; E[l]) under the public key of the recipient. 
Recall that the semantic security of ElGamal ensures the concealment of 
plaintexts. In other words, for plaintexts m and m' , a universal ciphertext 
{E[m\; E[l]) is indistinguishable from another {E[m']; E[l]) to any entity 
without knowledge of the corresponding private key. 

2. Universal mixing. Any server can be called upon to mix the contents of 
the bulletin board. This involves two operations: (1) The server re-encrypts 
all the universal ciphertexts on the bulletin board using URe, and (2) The 
server writes the resulting new ciphertexts back to the bulletin board in 
random order, overwriting the old ones. It is also desirable that a mix server 
be able to prove that it operated correctly. This can be done with a number 
of mixing schemes [6,9,11,13], and will be discussed in more detail below. 

3. Retrieval of the outputs. Potential recipients must try to decrypt every 
encrypted message output by the universal mixnet. Successful decryptions 
correspond to messages that were intended for that recipient. The others 
(corresponding to decryption output ‘T’) are discarded by the party at- 
tempting to perform the decryption. Recall that our construction of univer- 
sal encryption based on ElGamal ensures a binding between ciphertexts and 
keys, so that a given ciphertext can be decrypted only under one given key. 

Properties of the Basic Protocol: 

1. The universal mixnet holds no keying information. Public and private keys 
are managed exclusively by the players providing input ciphertexts and re- 
ceiving outputs from the mix. 

2. The universal mixnet guarantees only external anonymity. It does not pro- 
vide anonymity for senders with respect to receivers. Indeed a receiver can 
trace a message intended for her throughout the mixing process, since that 
message is encrypted under her public key. If ciphertexts are not posted 
anonymously, this means that the receiver can identify the players who have 
posted messages for her. This restriction to external anonymity is of lit- 
tle consequence for the applications we focus on, namely protection against 
traffic analysis, but should be borne in mind for other applications. 




Universal Re-encryption for Mixnets 



171 



3. The chief drawback of universal mixnets is the overhead that they impose 
on receivers. Since the public keys corresponding to individual output ci- 
phertexts are unknown, a receiver must attempt to decrypt each output 
ciphertext in order to find those encrypted under her private key. Thus the 
overhead for receivers is linear in the size of the input batch. (We discuss 
ways below and in section 6 to reduce this overhead somewhat.) 

Low-volume anonymous messaging: anonymizing bulletin boards. For 

simplicity, we have described above the operation of a universal mixnet in which 
inputs are submitted, mixed and finally retrieved. This sequence of events is 
characteristic of all mixes. Unlike regular mixes however, universal mixes allow 
for repeated interleaving of the submission, mixing and retrieval steps. What 
makes this possible is that the decryption is performed by the recipients of the 
message rather than by the mixnet, so that existing messages posted to the 
bulletin board are at all times indistinguishable from new messages. New inputs 
may be constantly added to the existing content of the bulletin board, and 
outputs retrieved, provided there is at least one round of mixing between every 
submission and retrieval to ensure privacy. 

This suggests a generalization of the private communication protocol de- 
scribed above, in which the bulletin board maintains at all times a pool of 
unclaimed messages. In other words, universal mixing lends itself naturally to 
the construction of an anonymizing bulletin hoard. Senders may add messages 
and receivers retrieve them at any time, provided there is always at least one 
round of mixing between each posting and retrieval. This protocol appears well 
suited to guarantee anonymity from external observers in a system in which few 
messages are exchanged. The privacy of the protocol relies on the existence of 
a steady pool of undelivered messages rather than on a constant flow of new 
messages. The former condition appears much easier to satisfy than the latter 
in cases when the total number of exchanged messages is small. This pooling of 
messages affords good anonymity protection, without the usual lack of verifia- 
bility of correct performance that vexes such schemes [4]. 

A potential drawback of a bulletin board based on universal mixing is that 
one must download the full contents in order to be assured of obtaining all of 
the messages addressed to oneself. This becomes problematic if the number of 
messages on the bulletin board is permitted to grow indefinitely. To mitigate 
this problem, it is possible to have recipients remove the messages they have 
received.^ An anonymizing bulletin board based on universal mixing has the 
important privacy-protecting feature that removal of a particular message does 
not reveal which entity posted that message. Another important observation, 
as described in the next section, is that only a portion of each message on a 

^ To ensure that messages are only removed by the intended recipient, a proof of 
knowledge of the corresponding decryption key is required. Note that such a proof 
can be performed without disclosing the public key associated with the required 
decryption key. For ciphertext C — [(oo, /3o); (oi, /3i)], this may take the form of 
a non-interactive zero-knowledge proof of knowledge of an exponent x such that 
ai = !3f - essentially a Schnorr signature [17]. 
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bulletin board need be downloaded to allow a recipient to determine which 
messages are intended for her. This further restricts the work required by a 
receiver. 

RFID-tag privacy. Universal re-encryption may be used to enhance the pri- 
vacy of RFID tags. The idea is to permit powerful computing agents external 
to RFID tags to universally re-encrypt the tag data (recall that the tags lack 
the computing power necessary to do the re-encryption themselves). Thus, for 
example, a consumer walking home with a bag of groceries containing RFID 
tags might have the ciphertexts on these tags re-encrypted by computing agents 
provided as a public service by shops and banks along the way. In this case, 
the tags in the bag of groceries will periodically change appearance, helping to 
defeat any tracking attempt. 

Application of universal mixnets to RFID-tag privacy is different in some 
important respects from realization of an anonymous bulletin board. As re- 
encryption naturally occurs for RFID tags on an individual basis, re-encryption 
in this setting may be regarded as realizing an asynchronous mixnet. There is 
also a special security consideration in this setting. Suppose that the ciphertext 
on an RFID tag is of the form (a, /3); (I, I) (where represents the identity ele- 
ment for Q). Then the ciphertext on the tag will not change upon re-encryption. 
Thus, it is important to prevent an active adversary from inserting such a ci- 
phertext onto an RFID tag so as to be able to trace it and undermine the privacy 
of the possessor. In particular, on processing ciphertexts, re-encryption agents 
should check that they do not possess this degenerate form. Of course, an ad- 
versary in this environment can always corrupt ciphertexts. Note, however, that 
even a corrupted ciphertext (a', /?'); ( 7 , S) will be rendered unrecognizable to an 
adversary provided that y,(5 yf 1. 

5 Security 

In this section, we define two security properties of universal mixnets: correct- 
ness and communication privacy. The mixnet is correct if the set of outputs 
it produces is a permutation of the set of inputs. The mixnet guarantees 
communication privacy if, when Alice sends a message to Bob and Cathy sends 
a message to Dario, an observer can not tell whether Alice (resp. Cathy) sent a 
message to Bob or Dario. 

Correctness. Correctness for universal mixnets follows directly from the defi- 
nition of correctness for standard mixnets. Like standard mix servers, universal 
servers must prove that they have performed the mixing operation correctly. 
For this, we can draw on essentially any of the proof techniques presented in the 
literature on mixnets, as nearly all apply to ElGamal ciphertexts. For example, 
to achieve universal verifiability, we can use the proof techniques in [6,13,11]. A 
small technical consideration, which may be dealt with straightforwardly, is the 
form of input ciphertexts. Input ciphertexts in most mix network constructions 
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consist of a single ElGamal ciphertext, while in our construction, an input 
consists of a universal ciphertext, and thus two related ElGamal ciphertexts. 

Communication privacy. We define next the property of communication 
privacy. In order to state this definition formally, we abstract away some of the 
operations of the mixnet by defining them in terms of oracle operations. We 
do this so as to focus our exposition on our universal construction, rather than 
underlying primitives, particularly as our construction can make use of a broad 
range of choices of such primitives. We define three oracles: 

• An oracle MIX which universally re-encrypts all ciphertexts on the bulletin 
board BB and outputs to BB the new set of ciphertexts in a randomly permuted 
order. In practice, we can substitute any mixnet with public verifiability for 
MIX. 

• An oracle POST that permits message posting. POST requires a poster 
to submit a message, encryption factors and ciphertext. It verifies that the 
message, encryption factors and ciphertext are elements of the appropriate 
groups and permits posting if the ciphertext is a valid encryption of the message 
with the given encryption factors. Note that the oracle POST may be regarded 
as simulating a proof of knowledge of the plaintext and the encryption factor 
and a verification thereof. In practice, it could be instantiated with standard 
discrete-log-based proofs of knowledge, e.g., [5], in either their interactive or 
non-interactive forms. 

• An oracle RETRIEVE that permits message retrieval. The oracle takes a 
private key and ciphertext from a user. The oracle verifies that the private 
key and ciphertext are elements of the appropriate groups. The user is allowed 
to remove the ciphertext if it is encrypted under the private key. Recall that 
our construction of universal encryption based on ElGamal ensures a binding 
between ciphertexts and keys, so that a given ciphertext can be decrypted only 
under one given key. The oracle RETRIEVE, like POST, abstracts away a proof of 
knowledge of the plaintext. 

We define communication privacy in terms of an experiment 
defined as follows. The adversary may make an arbitrary number of calls to any 
of the oracles RETRIEVE, MIX, or POST and may order these calls as desired. We 
enumerate the first several steps here for reference in our proof. 

Experiment fc) 

1. PKo ^ UKG; PKi ^ UKG; 

2. (mo, mi) <— A{PKq, PKi, “specify plaintexts”); 

S.bGu {0,1}; 

4. Cq = OEpKbimb) and C[ = OEpKi_h{mi-b) appended to BB; 

5. MIX invoked; 

6. A{BB); 

7. L ^ (C G BB s.t. C is a valid ciphertext under PKq}; 

8. b' ^ A{L, “guess &”); 

if 6 = 6' then output T’ else output ‘0’ 
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An intuitive description of this experiment is as follows. Alice and Bob wish 
each to transmit a single message to one of Cathy and Dario, who possess public 
keys PKq and PKi respectively. Our aim is to ensure that the adversary cannot 
tell whether Alice is sending a message to Cathy or Dario - and likewise to 
whom Bob is transmitting. The adversary is given the special (strong) power of 
determining which plaintexts, mg and mi , are to be received by Cathy and Dario. 
The adversary observes Alice posting ciphertext Cq and Bob posting ciphertext 
C[, but does not know which ciphertext is for Cathy and which is for Dario. 
The bulletin board is then subjected to a mixing operation so as to conceal the 
communication pattern. The adversary may subsequently control when and how 
the mix network is invoked, and may place its own ciphertexts on the bulletin 
board. Finally, at the end of the experiment, the adversary is given a list L of 
all ciphertexts encrypted under PKq, i.e., all the messages that Cathy retrieves. 
This list L will include the one such message posted by Alice or Bob in addition 
to all messages encrypted under PKq and posted by the adversary. The task 
of the adversary is to guess whether it was Alice who sent a message to Cathy 
(case 6 = 0) or Bob (case 6=1). 

Definition!. (Communication privacy) We say that a universal mixnet for 
UCS possesses communication privacy if for any adversary A that is polynomial 
time in k, we have fc) = 1] — 1/2 zs negligible in k. 

Theorem 1. Our universal mixnet possesses communieation privacy provided 
that UCS has universal semantic security under re-encryption. For our con- 
struction involving ElGamal, privacy may consequently he reduced to the DDH 
assumption over Q. 

Proof. Assume we have an adversary A for which pr[Exp/(””'"“^'^™([/C'5', fc) = 
1] — 1/2 is non-negligible in k. We build a new adversary A' which uses A as 
a subroutine and for which pr[Exp)^*®([/CS', A) = ‘1’] — 1/2 is non-negligible in 
k (i.e. Al breaks the universal semantic security of the underlying encryption 
scheme). Al operates as follows: 

— At the beginning of Exp“'*®, Al is given two public keys PKq and PK\. A' 
gives these two keys to A. This simulates step 1 of 

— When A calls one of the oracles POST, MIX or RETRIEVE, A' can trivially 
simulate the oracle for the requested operation for A. 

— In step 2 of experiment Exp'^°’”'"“^’'“ , A specifies plaintexts mo and 
mi. A' selects random encryption factors tq and ri and computes Cq = 
UEpxo(mo, vq) and Ci = UEpxi(mi,ri). A' submits these in the second 
step of Exp“®^ . A' then receives as input from Exp“®'* two new ciphertexts 
Cq and C[. 

— In step 4 of Exp“’"'"“^"’', A' posts Cq and C( to the bulletin board. 

— In step 7 of A' must identify the set of outputs encrypted 

under PKq. Note that A' can easily identify the outputs that correspond to 
inputs originally submitted by A encrypted under PKq , since it controls the 
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oracle POST and MIX. The difficulty is for A' to decide which of Cg and C[ 

is encrypted under PKq and which under PKi. Since A' doesn’t know that, 

it arbitrarily assigns C'q to the list L of ciphertexts encrypted under PKq. 

In the last step of the simulation, A! assigns Cq arbitrarily to L. We claim 
that if A can distinguish between the case where this assignment to L is cor- 
rect and the case where it is incorrect, then A can be used to break universal 
semantic security in Exp“®'* . This may be achieved with a small modification of 
our simulation as follows: (1) A' lets Cq = Cq and C[ = Ci, but invokes Exp“®^ 
on the pair {Cq,C'i) during the mixing operation in step 5 and (2) A' submits 
to the bit b' yielded by A at the end of the experiment. Let us assume, 

therefore, that the assignment to L is correct. Given this, when A outputs its 
guess 6', A' then outputs the same bit h' as its guess for the experiment Exp^^^. 
It is clear now that when A guesses correctly, so does A'. □ 

Security of UCS and chosen-ciphertext attacks. The cryptosystem UCS 
inherits the semantic security property of the underlying ElGamal cipher under 
the DDH assumption. This property is critical to our definition of communica- 
tions privacy. Our model for communication privacy makes one simplifying as- 
sumption though: We assume that the adversary does not learn any information 
about plaintexts. For this reason, we do not require adaptive-chosen ciphertext 
(GGA) security of our cryptosystem. In fact, our system cannot achieve strict 
GGA security: In order to permit re-encryption, ciphertexts must be malleable. 
Note, however, that an adversary cannot repost a message or post a new message 
with a related plaintext since POST requires a proof of knowledge of the plaintext 
and encryption factors. 

On the other hand, there may be circumstances in which an adversary learns 
information about plaintexts in our system. To show this formally, it would be 
necessary to modify our universal cryptosystem so as to achieve GGA security 
with benign malleability, as defined by Shoup [18]. In Shoup’s terminology, we 
would need to require an induced eompatible relation of plaintext equivalence 
by formatting plaintexts with appropriate padding. We omit detailed discussion 
of this topic. An adversary that can gain significant information about received 
messages can, after all, break the basic privacy guarantees of the system. 

6 Hybrid Universal Mixing 

We describe next a variant mixnet called a hybrid universal mixnet. This type of 
mixnet combines symmetric and public-key encryption to accommodate poten- 
tially very long messages (all of the same size) in an efficient manner. We refer 
the interested reader to [10] for definitions and examples of hybrid mixnets. 

Our definition of a universal hybrid mix considers a weaker threat model than 
above with respect to correctness. Since hybrid mixes use symmetric encryption, 
we cannot verify that they execute the protocol correctly. Thus, we restrict our 
security model to mix servers subject only to passive adversarial corruption. Such 
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servers are also known as honest-but- curious. They follow the protocol correctly 
but try to learn as much information as possible from its execution. 

For efficiency, inputs m are submitted to a hybrid mix encrypted under an 
initial symmetric (rather than public) key. We denote by efe[m] the symmetric- 
key encryption of m under key k. Each mix server Si re-encrypts the output 
of the previous mix under a new random symmetric key ki. With k servers, 
the final output of the mix is [. . . efcjefc[m]] . . .]. The symmetric keys 

k,ki, . . . ,kn must be conveyed alongside the encrypted message to enable de- 
cryption by the final recipient. These keys are themselves universally encrypted 
under the public key of the recipient. Universal encryption provides an efficient 
way of transmitting the symmetric keys without compromising privacy. 

We define next our hybrid universal mixnet. Our construction imposes an 
upper bound n on the maximum number of times that the mixing operation is 
performed on any given ciphertext. The protocol consists of the following steps: 

1. Submission of inputs. An input ciphertext takes the form 

ek,[mhE[l],{E[ko],E[l]...E[l]) 

where Ckg [fn] denotes symmetric encryption of m under key kg ■ This is fol- 
lowed by an encryption of 1, and by a vector of ciphertexts on keys, where 
only the first element is filled in (with ko), leaving the remaining n — 1 
elements as encryptions of 1. 

2. Universal mixing. The server to perform the mixing operation does 
the following for each of the ciphertexts on the bulletin board: 

— Generates a random symmetric key kf, 

— Adds a new layer of symmetric encryption to m under key ki] 

— Uses the second element, A[l], to compute an encryption E[ki] of ki; 

— Rotates the elements of the vector one step leftwards, then substituting 
the first element with E[ki\; and 

— Re-encrypts the second element and each element of the vector. 

When it has thus processed all its inputs in this manner, the server outputs 
them back to the bulletin board in a random order. 

3. Retrieval of the outputs. At the end of d < n mixing operations, the final 
output of the mixnet assumes the form: 

Cfe, [. . . Cfeo H] • ■ .],E[lU{E[l]r-^ E[ko] . . . E[kd]), 

where {A[l]}”“'^ denotes n — d ElGamal ciphertexts on the identity element. 
As before, recipients try to decrypt every output of the mixnet and discard 
outputs for which the decryption fails. Note that a party need only decrypt 
the second element, E[l], to determine whether a ciphertext is for her. 

Remark: In principle, it is possible to use the “blank” ciphertext E[l] to append 
ciphertexts on as many symmetric keys as desired, and thus re-encrypt indefi- 
nitely. The reason for restricting the number of “blank” ciphertexts to exactly 
n is to preserve a uniform length, without which an adversary can distinguish 
among ciphertexts that have undergone differing numbers of re-encryptions. A 
drawback of this approach is that a ciphertext re-encrypted more than n times 
will become undecipherable by the receiver. 
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7 Conclusion 

Universal re-encryption represents a simple modification to the basic ElGamal 
cryptosystem that permits re-randomization of ciphertexts without knowledge of 
the corresponding private key. This provides a valuable tool for the construction 
of privacy-preserving architectures that dispense with the complications and 
risks of distributed key setup and management. The costs for the basic universal 
cryptosystem are only twice those of ordinary ElGamal. On the other hand, the 
problem of receiver costs in a universal mixnet presents a compelling line of 
further research. In our construction, a receiver must perform a linear number 
of decryptions to identify messages intended for her. A method for reducing this 
cost would be appealing from both a technical and practical standpoint. 
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Abstract. We analyze a new class of primitives called weak commit- 
ments. We completely characterize when bit commitments can be re- 
duced to these primitives. Also, we employ a new concept in crypto- 
graphic reductions, the rate of a reduction. We propose protocols achiev- 
ing a nontrivial rate. We provide examples of how to implement these 
primitives based on oblivious transfer and on quantum mechanics. Us- 
ing the theory here developed, some open problems on computationally 
secure quantum bit commitments are solved. Our reductions are infor- 
mation theoretically secure. 



1 Introduction 

Whenever a cryptographic primitive is implemented by a physical process there 
is a certain probability of failure or even the possibility that a cheater has (lim- 
ited) control over system parameters of the physical process. In this paper we 
will introduce a new primitive, a weak bit commitment, where the sender (Alice) 
and the receiver (Bob) can cheat with a certain probability thereby reflecting 
the above problem. This paper will give tight bounds on the cheating probabil- 
ities for which bit commitment can be reduced to weak bit commitments in an 
information theoretically secure way. 

But we will not only look at the possibility of reducing bit commitments to 
weak bit commitments, we will furthermore look at the efficiency of reductions. 
To do this we consider bit string commitments and use as a measure of efficiency 
the new concept of the rate of an information theoretically secure reduction, i.e., 
the length of the bit string divided by the number of weak bit commitments 
employed where the length of the string is going to infinity [16]. This will allow 
us to import methods from Shannon theory to achieve our reductions. Our bit 
string commitment reductions will exhibit nonzero rates. 

Definition 1. (Informal) A bit (string) commitment is a protocol consisting of 
two phases a commit phase and an unveil phase (which need not be entered). 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 179-193, 2004. 
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During the commit phase Alice is supposed to fix a bit (string) b which is not to 
be changed and will be opened to Bob in the unveil phase. We say that a protocol 
securely realizes bit (string) commitment if the protocol is concealing, binding, 
and sound. Concealing: A bit (string) commitment scheme is concealing if 
Bob’s information about the committed bit (string) before the unveil phase is 
negligible in the security parameteA . Binding: A bit (string) commitment is 
binding if the probability that Alice is able to after the commit phase unveil 
more than one bit (string) is negligible in the security parameter. Sound: A bit 
commitment (string) is sound if its probability of failure for honest Alice and 
honest Bob is negligible in the security parameter. 



It is important to remark that the above definition is valid only for classical 
bit (string) commitments. Secure quantum bit commitments are defined in a 
different way (see Section 6). 

In a weak bit commitment Alice will be able to violate the binding property 
with a probability bounded by a and Bob will learn the committed bit already 
in the commit phase with a probability bounded by (3. Two types of weak com- 
mitments will be distinguished depending on whether Alice knows in advance 
if her attempt to cheat will remain undetected. In a weak bit commitment of 
type I (see Definition 2) Alice knows if she can cheat without being detected 
cheating and this happens with a probability no greater than a. Bob will learn 
the committed bit during the commit phase with a probability no greater than 
l3. The notion of a weak bit commitment of type II is cryptographically stronger 
(see Definition 3). Alice will be able to cheat without being detected with a 
probability no greater than a but she will not know if this is the case and she 
will have a risk of being detected cheating if she tries. Bob again learns the 
committed bit beforehand with a probability no greater than (3. 

Note that the actual probabilities of cheating are not specified, but bounded 
by a and [3 respectively. This is very important as it reflects an unfair primi- 
tive which might work a lot better if no-one tries to tweak system parameters. 
Imagine a weak bit commitment where the probability that Bob learns the com- 
mitted bit beforehand is guaranteed to be exactly (3, then one could perform n 
commitments to random bits and ask Bob to announce approximately (3n bits 
he knows. These bits could be checked by Alice. Thereby she will be convinced 
that Bob knows a lot less about the remaining commitments which could then be 
used to implement a stronger commitment. The protocol sketched above would 
not work in our case as Bob can always claim to not have learned a single bit. 

Later in the paper we give two examples for reductions to weak bit commit- 
ments. First, we will show how to obtain bit string commitment with a nonzero 
rate from Rabin oblivious transfer (in [11], Kilian proved that OT implies BC, 
but his reduction had a rate asymptotically equal to zero) and from 1 out of 2 
Oblivious Transfer. We show that 1 out of 2- OT is strictly stronger (in the sense 

^ As we will be interested in the asymptotic rate of reductions we will look at the 
limit of the rate for the length of the string going to infinity. In this situation we can 
identify the security parameter and the length of the string to be committed. 




Bit String Commitment Reductions with a Non-zero Rate 



181 



that better rates can be achieved) than Rabin OT when it comes to implement- 
ing string commitments. Second, we will improve a quantum bit commitment 
scheme based on computational assumptions by making it robust against mul- 
tiple photons which could be emitted in one pulse. This was stated as an open 
problem in [10]. 

An interesting innovation in this work is the use of well known tools from 
Shannon theory like typical sequences, entropic inequalities and random coding 
arguments. Actually, we prove that when appropriately written, some reductions 
among cryptographic primitives are equivalent to classical problems in Shannon 
theory. This is a new research direction which opens interesting questions both 
in Shannon theory and in cryptography. 



1.1 Related Work 

Several researchers have been trying to base security of cryptographic protocols 
in somehow “weaker” primitives. However, previous works concentrated mostly 
on proving the existence of certain reductions, thus they, usually, did not pay 
attention to the question of rate. Crepeau and Kilian [3], proved that a noise 
channel can be used to achieve oblivious transfer and bit commitment. Cre- 
peau [7] improved the results of [3] by using privacy amplification techniques. 
However, no impossibility results were proven in these papers and also there was 
no concern about achieving rates different than zero. 

Cachin reduced OT to a weaker primitive called universal oblivious transfer 
in [2], but here again, the aspect of rate was not considered. 

Our weak commitment is a particular case of a weak generic transfer (WGT), 
a primitive introduced by Damgard, Salvail and Kilian in [8]. As another par- 
ticular case of a WGT they defined a weak oblivious transfer. A weak oblivious 
transfer is an oblivious transfer protocol where Alice and Bob can successfully 
cheat with non-negligible probability. They characterized when oblivious transfer 
can be reduced to its weaker version. However, in their reductions the achieved 
rates were asymptotically equal to zero. Also, their approach was different from 
the approach taken here which formulates reductions as Shannon theoretical 
problems. 

In [9] and [15] string commitments achieving non trivial rates were intro- 
duced. However, these reductions were not information theoretically secure for 
both the sender and the receiver of a commitment. 

The work which most overlaps with ours is [21], where the commitment 
capacity of a discrete memoryless channel (and of a class of quantum channels) 
was determined. Although the techniques we use are similar to the ones used 
in [21], there the direct part was proved by using a random code (which cannot 
be even efficiently stored), but here we proved that our rates are achievable 
by a random linear code (which can be efficiently stored) thus, as there is no 
need of efficient decodable codes in our algorithms, our solutions are practical. 
Also, we provide a general theory of weak commitments, whereas in [21] only 
the particular case of noisy channels was analyzed. 
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2 Preliminaries and Statement of Results 

A weak bit commitment is a protocol where Alice and Bob can successfully 
cheat with non-negligible probability. Here, we split weak commitments in two 
categories according to the way they are implemented. Weak commitments of 
type I are the ones where the sender, with probability at most a, knows that 
she can cheat without being detected and a dishonest receiver is able to break 
the concealing condition of the commitment with probability of at most j3. 

In weak commitments of type II, the sender does not know a priori if she can 
successfully cheat or not, but if she tries to cheat, she succeeds with probability 
at most a. As in weak commitments of type I, the receiver can break the secrecy 
of the commitment before the unveiling phase with probability at most /3. 

We show in the following sections that the fact that the sender knows in 
advance if she can cheat without being detected or not makes difference in the 
security analysis and in the results that can be achieved. 

To construct an example of a weak commitment of type I, we use a protocol 
proposed by Rivest in [17]. Rivest proved that if a trusted authority makes 
some pre-distributed data available to Alice and Bob (he called it the trusted 
initializer model) unconditionally secure bit commitment can be implemented. 
In our version, the TI, after distributing secret data to Alice and Bob, with 
probability a, tells Alice the secret data which was given to Bob and ensures 
her that she will be able to successfully cheat. With probability /?, TI tells Bob 
the secret data which was given to Alice. In this scenario, if Alice receives Bob’s 
secret data from the TI she knows for sure that she is able to cheat without being 
detected. As examples of a weak commitments of type II we cite the commitment 
schemes based on noisy channels such as the ones proposed in [7] and in [8]. More 
examples are given in the Sections 5 and 6. We formalize our definitions below. 

Definition 2. An (a, /?) — bc{b) of type I, (a, /?) — bci{b) for short, is a bit com- 
mitment protocol where additional information is provided to the participants. It 
consists of two phases: (3 — Commiti{b) and a — Openj{b). In an a — Openi{b) 
algorithm the committer receives extra information (which will make possible for 
the sender to cheat without being detected) from an oracle with probability a. In 
a (3 — Commiti{b) algorithm Bob, the receiver, learns the value of a committed 
bit with probability at most (3. 



Definition 3. An {a, (3) — bc{b) of type II, {a, (3) — bcu{b) for short, consists of 
two phases: fd — Commit u{b) and a — Openji{b). In an a — Openu{b) algorithm 
the committer is able to commit to something and later on unveil b = 0 and 
6=1 with success probability at most a. In a (3 — Commitu{b) algorithm. Bob, 
the receiver, learns the value of a committed bit with probability at most (3. 

It is clear that a bit commitment can be reduced to a wide class of weak 
commitments. In the following we denote sequential commitments to 6j, 1 < t < 
n. using an (a, (3) — bc{b) protocol by the following notation (a, (3) — 60 ( 6162 . .. 6 „). 
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When the same proof applies to weak commitments of type I and II, in the 
following we denote a generic weak commitment by (a, (3) — bc{h). 

Proposition 1. A hit commitment protocol bc{b) can he reduced to {a, 0)— 6c/(6) 
and (a, 0) — bcu{b) protoeols where a < 1. 

Proof. As the same proof applies to weak commitments of type I and II, in the 
following we denote a generic weak commitment by (a, /3) — bc{b). When the bit 
to be committed is b, the committer repeatedly commits to the same value b, n 
times, using a (a,0) — bc{b) protocol, the receiver only accepts a commitment if 
all the committed bit are the same. Even if Alice does not stick to the protocol 
and commits to different bits she has in at least n/2 instances of the weak bit 
commitment protocol committed to a bit b. To unveil b she has to change no 
less than n/2 commitments. Alice’s probability of successfully cheating (Pg) is 
no greater than and lim„_,.oo Pe = lim„_,.oo = 0. 



Proposition 2. A bit eommitment protoeol bc{h) ean be reduced to a (0,/3) — 
bci{b) and (0,/3) — bcu{b) protocols, P < 1 

Proof. Again, we denote a generic weak commitment by (a,/3) — bc{b). When 
the committed bit is b, Alice chooses a random n— bit word b\b 2 ...bn such that 
6 = 5i © 62 ® ••• ® bn where © stands for the XOR operation. Alice sequentially 
commits to 6 ^, 1 < i < n. using an (0,/3) — bc{b) protocol. We represent it by 
the following notation (0,/3) — bc{bib 2 ...bn) where the committed bit is equal 
to the XOR of b\b 2 ...bn. If S represents the random variable associated with 
b = 5 i© 62 ©...© 6 n and Z is the random variable associated with the commitments 
observed (broken) by Bob we have that: Yiu\n^oo H{S\Z) = 1, i.e., the scheme 
is concealing. 

It is natural to ask when a bit commitment cannot be reduced to its weaker 
version. The next proposition shows us a class of weak commitments that cannot 
be used to achieve a bit commitment. 

Proposition 3. It is impossible to reduce a bit eommitment protoeol to an 
(a, P) — bcj{b) when a + P > 1. 

Proof, li a + P = 1 and we assume that the events Aliee ean cheat and Bob ean 
cheat are not independent, we can in the following assume a weak bit commit- 
ment where either Alice can cheat or Bob can cheat. Bit commitment cannot 
be implemented with such a weak primitive: All bits learned by Bob during the 
execution of a bit commitment based on such a weak primitive should not give 
away the bit Alice commits to. Hence the bit Alice is committed to is not fixed 
by the information Bob received, but all remaining bits can be changed by Alice. 
Thus no bit commitment built with a weak bit commitment with a + P >1 can 
be binding and concealing. 
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Propositions 1 and 2 tell us that, in principle, a “strong” bit commitment 
protocol can be reduced to its “weaker” versions. However, as the cheating prob- 
ability goes to zero, the rate also goes to zero, since n weak commitments are 
used to achieve one single bit commitment. The main contribution of this paper 
is to show some reductions where the cheating probability goes to zero, but the 
rate converges to a number bounded away from zero. The key point is to look 
at string commitment protocols instead of single bit commitments. We state the 
main theorem of this paper which shows that the bound proved in the last propo- 
sition is tight. We present the proof of this theorem in the following sections. 
For a formal definition of rate we point at the next section. 

Theorem 1. A hit commitment protocol can he reduced to any {a, (3) — bci{b) 
with a + !3 < 1 and any {a,/3) — bcjj{b) with a < 1, j3 < 1. Moreover, a rate 
R= \ — a — (3 is achievable in the case of commitments of type I and R= 1— (3 
is achievable in the case of commitments of type II. 

3 String Commitment Reductions Achieving Non-trivial 
Rates 

In this section we describe the new concept of rate in bit string commitment re- 
ductions [16]. Informally, by rate we mean the ratio between the dimension of the 
string which is being committed to divided by the number of weak commitments 
used. 

Similar to error correcting codes in communication systems, we ask Alice to 
encode the string which she commits to into another, larger, string, before she 
uses the protocol (3 — Commit. However, differently than in a communication 
system, our codes not only add redundancy but also randomness to avoid Bob 
learning the value of the commitment before the unveiling phase. Here, we con- 
centrate in non-interactive reductions. Thus, Alice encodes the string she wants 
to commit to, here called s, into another longer string x, and them commits to 
each bit of x by using j3 — Commit protocols. Later on, to open the commit- 
ments, Alice runs the algorithm a. — Open and opens the commitment to each 
bit of X and sends s to Bob. The information Bob receives during the executions 
of the algorithm a — Open is denoted by y (the domain of this information is 
not important in our reductions) . Bob performs a test based on the information 
he received during the commit and opening phases. Based on the result of this 
test. Bob accepts or rejects Alice’s commitment to s. 

In the following, the view of a player at a certain stage of a protocol is the set 
of all the messages received by this player plus the random bits which were used 
by him during the protocol. We denote by S the random variable associated 
with s. X denotes the random variable associated with the encoding of s {X 
is the input to the algorithm f3 — Commit). Let Z denote the random variable 
associated with Bob’s view of the protocol after a commitment to s is performed, 
that is after the f3 — Commit algorithm is performed, and let Y represent Bob’s 
view after the weak commitments are open, that is, after the protocols a — Open 
are performed. 




Bit String Commitment Reductions with a Non-zero Rate 



185 



Definition 4. A non-interactive reduction fromhc{bib 2 ---hk) 

to{a,P) — bc{xiX 2 ---Xn) consists of a pair of 

mappingsE : {0 , 1}'^^ {0 , 1}"'] D : {0 , {0 , 1}'^ and a test j3{s,Y,Z) € 

{ACC, RE J}. The rate of this reduction is definedasR =^. A rate R 
is said to be achievable if there exists a pair {E, D) such that: lim„_>oo 
{P[f3{s, Y, Z) = ACC A /3(s', Y',Z) = ACC] = Pg = 0 for any s and s' G {0, 1}^ 
such that s ^ s' ancflim„_>oo /(S' : Z)/k = 0, wherel(-) is the mutual infor- 
mation function, Z is the random variable associated with Bob’s view after a 
j3 — Commit{E{bib 2 ---bk)) algorithm is performed and Y represents his view 
after the algorithm a — Open {E{bib 2 ---bk)) is executed (Y' denotes a possible 
cheating strategy for Alice) 

A reduction that achieves the supremum of all the achievable rates is called 
optimal. It is important to discuss for a while our notion of a concealing protocol 
lim„_>ood(S : Z)/k = 0. A stronger notion of secrecy could be proposed as in 
[14]: lim„_>ood(S : Z) = 0. We prove that our reduction also achieves this 
stronger notion of secrecy. 



3.1 A Non-trivial Reduction to an (a, 0) — bcji Protocol 

Here, we compute a non-trivial achievable rate for a reduction to an (a, 0) — be 
protocol. The mappings E and D in this reduction are given by a randomly 
generated linear code. Namely, a binary generating matrix C (of dimensions 
kxn, k = Rn) is created at random. We know that there exists a constant p such 
that a binary matrix of size Rn x n defines a binary linear code with minimum 
distance at least 5n except with probability not greater than where 

h{6) = —SlogS— (1 — 6) log(l — i5)[7j. The mappings are made available to both 
Alice and Bob. They are also supposed to know the cheating probability a. To 
commit to a string s = b\...bk, uniformly chosen, Alice chooses the bit string 
X = xi...Xn associated with b\...bk and then runs Commit{xi...Xn). To open the 
commitment, Alice runs a—Open{xi...Xn). The receiver accepts the commitment 
to the string bi...bk = D{xi...Xn) iff Xi...Xn is a valid codeword. 

Proposition 4. All rates of reductions from bc{xiX 2 -..Xk) to (o;,0) — 
bc{yiy 2 ...yn) which are inferior to 1 are achievable. 

Proof. By definition, an (a,0) — bc{b) protocol is concealing, thus a dishonest 
Bob can never cheat. The most general dishonest strategy for a dishonest Alice 
is to commit to a word w (which may not be a valid codeword) and later on 
unveil a valid codeword. The protocol is insecure if the probability that Alice can 
unveil two valid codewords is non-negligible. For any i? < 1 and large n, there 
is a (5 > 0 so that the random linear code used by Alice and Bob has minimum 
distance equal to Sn, with high probability. Suppose that Alice wants to be able 
to unveil two different codewords ci and C 2 and that she has committed to a 
general (not necessarily a codeword) w. As the minimum distance of the code 
is 6n, the hamming distance of one of these two codewords {ci,C 2 }, w.l.o.g say 
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Cl, and w is at least (5n/2. Thus, the probability that she can successfully open 
Cl is upper bounded by which goes to zero exponentially with n and our 

result follows 

Note that, although we use random linear codes (which are not known to 
be efficiently decodable), are algorithms are efficient, since Bob need not decode 
anything in the protocol, he just checks if the word opened by Alice is a codeword 
or not. 



3.2 A Reduction to an (a,/3) — bcu Protocol with o;,/3 < 1 

We now show a reduction of bit commitment to (a,P) — bcu{b) with (3 ^ Q 
and a, (3 < 1. The encoding used is the same that was used by Wyner in his 
seminal “wiretap channels” paper[20]. In order to commit to a string b\b 2 ---bk, 
represented by a random variable S, Alice randomly selects r bits jij 2 ---jr, here 
represented by the random variable J, and concatenates these two strings form- 
ing aq = k+r bits word: bib 2 ---bkjij 2 ---jr- Alice then proceeds with the encoding 
scheme described in the reduction to an (a, 0) — be protocol. A binary generating 
matrix G of dimensions Tn x n, T = where r = (3n, is generated. The 

overall rate of the reduction is R = As in the last section, the random variable 
associated with the codewords is represented by X. The next theorem follows. 

Proposition 5. The above described reduction achieves unconditionally bind- 
ingness against the sender, Alice, ifT <l^r-\-k<n. 

Proof. As we have to prove just the binding property, this proof follows directly 
from Proposition 4. To see this, we observe that the above reduction is equivalent 
(from the sender’s security point of view) to a reduction to a (o;,0) — bcu. 
Therefore, the reduction presented in the last section works and gives us a rate 
R<l-P. 

Now me must show that this reduction provides unconditionally concealment 
against Bob. As already defined, Z represents the random variable associated 
with the bits observed by Bob when a f3 — Commit{xiX 2 ...Xn) algorithm is 
performed. From successive applications of the chain rule for entropy [6] we have: 
H{S\Z) = H{S, Z) - H{Z) = H{S, X, Z) - H{X\S, Z) - H{Z) = H{S\X, Z) + 
h\x, Z) - H{X\S, Z) - H{Z) = H{S\X, Z) + H{X\Z) - H{X\S, Z). Here we 
make use of Fano’s inequality. For convenience of the reader we state this theorem 
bellow: 

Theorem 2. Fano Inequality: Let X and Y be random variables with alphabets 
TL, defined a decoding probability of error Pe we have that: h{Pe) Pelog(|'H| — 
1) > H{X\Y), where h{-) is the entropy function. 

Define A as the probability of wrongly guessing the random variable X when 
knowing Z and S. From Fano’s inequality we have: H{X\Z, S) < h{X)-\-Xn. Also, 
we know that: H{S\Z) = H{S\X, Z) + H{X\Z) - H{X\S, Z), since H{X\Z) = 
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n(l — /3) (as Bob breaks n/3 weak commitments in average), H{X\S, Z) < h{\) + 
An and H{S\X,Z) = 0, it follows that: H{S\Z) > n(l — (3) — h{X) — An > 
n(l — P) — h{A) — An + k — k. As P = we have: H{S\Z) > k + n — r — 
k - h{A) - An ^ H{S\Z)/k > 1 -h ^ - A^, but r -h fc < n, hence 

H{S\Z)/k> 1+^-A/R. 

Now we must show that as k increases A — >■ 0. We remember that A is 
defined as the probability of wrongly guessing the random variable X = SJ 
when knowing Z and S. We note that when the random variable S is given to 
Bob, only 2'’ of the 2’’+* possible codewords are left to be guessed. Therefore, 
A is the error probability of decoding a random coding scheme of rate less than 
r/n = I{X : Z)jn. From the direct part of the channel coding theorem we know 
that this probability goes to zero when n becomes large and hence A — >■ 0. It is 
still necessary to prove that a stronger notion of secrecy lim„_>oo : Z) = Q 
can also be achieved. To do so, we note that from the direct part of the noisy 
coding theorem, A goes to zero exponentially with k, so if we multiply H(S\Z) /k 
by k we are left with the following inequality: H{S\Z) > k + h{A) — Ak/R. As A 
goes to zero exponentially, we have our result: lim„_>oo H{S\Z) > k. The proof 
of the next theorem follows from the previous arguments. 

Theorem 3. There exists a reduction of a string commitment scheme to any 
(a,P) — bcii achieving a rate R < 1 — P when a,P < 1. 

4 A Reduction to an (cc, f3) — bcj Protocol with ck + /3 < 1 



Here, we prove that any (a, P) — bcj protocol with a + P < 1 can be used to 
implement strong string commitments achieving a rate R = 1 — a — p. First we 
prove that string commitments can be reduced to any (a, 0) — bci protocol with 
a < 1 achieving a rate R = 1 — a. Again, the mappings E and D in this reduction 
are given by a randomly generated linear code. Thus, a binary generating matrix 
G (of dimensions kxn, k = Rn, R < 1 — a) is created at random. The mappings 
are made available to both Alice and Bob. They are also supposed to know the 
cheating probability a. To commit to a string s = b\...bk, uniformly chosen, 
Alice chooses the bit string x = Xi...Xn associated with 6i...6fc and then runs 
Commiti{xi...x„). To open the commitment, Alice runs a—Openj{xi...Xn)- The 
receiver accepts the commitment to the string b\...bk = D{xi...Xn) iff xi...Xn is 
a valid codeword. 

Proposition 6. All rates of reductions from hc{xiX 2 ---Xk) to (o;,0) — 
bci{yiy 2 ---yn) which are inferior to 1 — a are achievable. 

Proof. In average, Alice will be able to cheat in approximately an (for large n) 
weak commitments. Moreover, she knows where she can cheat in advance. She 
will able to break the security conditions of the strong string commitment if 
she finds two codewords which have the same bits on exactly (1 — a)n positions 
(the ones where she will not be able to change the weak commitment). The 
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probability that those two codewords are generated can be upper bounded by 
2(i-“)"2"^, which can be made arbitrarily small as far as i? < 1 — a. 

Observing that the arguments presented in Section 3.2 straightforwardly ap- 
ply here, we are able then to prove the following theorem: 

Theorem 4. There exists a reduction of a string commitment scheme to any 
(a, (3) — bci achieving a rate R < \ — a — (3 when a + f3 < 1. 

5 Weak Commitments Based on Rabin Oblivious 
Transfer and on 1 out of 2-Oblivious Transfer 

As an example, we shall describe a weak commitment based on Rabin oblivious 
transfer. Our reduction, in spite of its simplicity, achieves a rate bounded away 
from zero thereby showing the generality of the theory developed in the previous 
sections. Rabin oblivious transfer is a primitive where a sender (Alice) sends a 
bit 6 to a receiver (Bob). Bob receives it with probability 1/2 or he receives an 
erasure symbol A otherwise. The protocol is secure if Alice does not know if Bob 
received the bit or not. We describe a simple protocol to implement a weak bit 
commitment from OT. To commit to a bit 6, Alice sends the bit b through the 

Rabin OT. In the opening phase Alice sends the bit b to Bob in the clear. If Bob 
received an erasure A in the commit phase. Bob always accepts b. Otherwise, 
Bob only accepts Alice commitment if the bit she announces is in agreement 
with the bit he received during the commit phase. 

From the point of view of this paper Rabin oblivious transfer can be seen as 
a (1/2, l/2)-weak bit commitment of type II, since Alice does not know a priori 
if she will be able to cheat during the open phase or not and with probability 
1 /2 the sender learns the commitment before the opening phase and if the bit b 
is erasured during the commit phase the sender can successfully open any bit to 
the receiver. Hence we can apply the reduction of the previous sections and any 
rate below i?=l — l/2=l/2 can be achieved by this reduction. 

Now, we show another example of weak commitment based on 1 out of 2 - 
OT. A (bo,bi) — OT{c) is a protocol where a sender inputs two bits bo and bi 
and a receiver inputs a bit c. At the end of the protocol the receiver gets be and 
the sender receives nothing. The protocol is secure if the sender learns nothing 
about c after an execution of it and if the receiver learns nothing about 6g- 

Our protocol is as follows: to commit to a bit 6, Alice runs (6 © r, r) — OT{c) 
with Bob, where r and c are chosen at random and © is the XOR operation. To 
unveil this bit, Alice sends the bits b and r to Bob in the clear. Bob accepts the 
commitment iff these bits are consistent with the information he has received in 
the commit phase. The previous protocol is obviously concealing, however Alice 
can cheat with probability at most 1/2. Thus we are left with an (l/2,0)-weak 
bit commitment of type II. Hence, by applying the reduction of last Section, any 
rate below R = 1 can be achieved by this reduction. 
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One can prove that a rate R = 1/2 is indeed optimal when reducing bit 
commitment to Rabin oblivious transfer[21]. Therefore, we observe that 1 out of 
2-OT is strictly more powerful than Rabin OT when it comes to implementing 
string commitments. 

6 Improving a Computational Secure Quantum Bit 
Commitment Protocol 

In this section, we show how the proposed reductions can be used to turn a 
quantum bit commitment protocol based on any quantum one-way permutation 
robust against multiple-photons. This is a rather surprising application of our 
reductions, since they are basically classical reductions. However, in this section, 
we show that some of our reductions can be proven to be secure against some 
quantum adversaries. 

Quantum cryptography came on to the scene as a possible way to achieve 
secure bit commitment schemes based solely on the laws of physics. These hopes 
were ruled out when Mayers [13] and Lo and Chau [12] proved that any bit com- 
mitment protocol can be broken if the parties involved have unlimited quantum 
computational power. The main idea of the proof is that the existence of a se- 
cure quantum bit commitment protocol implies the existence of a secure purified 
protocol where all actions are kept at the quantum level such that the result of 
the protocol is a pure state shared between Alice and Bob. If such a scheme is 
concealing, i.e., the state in the possession of Bob is approximately the same 
for a commitment to one and for a commitment to zero then Alice can change 
the bit she committed to on her side by applying a transformation U on her 
qubits. Hence no quantum bit commitment scheme can be binding and con- 
cealing. Just after the no go theorem of Mayers and Lo and Chau the research 
community started looking for assumptions that could be used to implement 
quantum bit commitment protocols. Salvail [18] proved that we can have secure 
bit commitment under the assumption that some kind of measurement cannot 
be performed by one of the participants. In the paper [10], Dumais, Salvail and 
Mayers generalized the idea of complexity based cryptography from the classical 
to the quantum world. They proposed a protocol which the security is based on 
any quantum one-way permutation. 

We briefly introduce their notations before reviewing their protocol. In 
the following we use the computational {|0),|1)} and diagonal basis {|0)x = 
^(|0) -I- |l)),|l)x = :^(|0) ~ |1))}> here denoted by -I- and x respectively. 
A bit string y G {0, 1}” encoded in the computational basis (diagonal basis) 
is represented by |j/)g(o)n(|y)e(i)»>). Following [10] we write the quantum state 

n 

(2) \yi)d(wi) as \y)e(w)'^- The following projections are used in our discussion: 

= l2/)e(6)"e(&)'>(j/|- S = {an ■■ {0,1} -)> {0,l}"|n > 0} denotes a family 
of one-way permutations. Dumais et al.’s protocol is as follows: To commit to 
a bit w, Alice chooses x G (0, 1}” and computes y = an{x). Alice then sends 
the quantum states |cr„(a;))e(u,)n . To unveil the bit, Alice announces w and x 




190 A.C.A. Nascimento, J. Mueller-Quade, and H. Imai 



to Bob. Bob measures his received state with the measurement }ye{o,i}» 

and accepts the commitment iff the outcome of the measurement he performs is 
equal to cr„(a;). 

It is easy to see that the above protocol is unconditionally concealing (since 
the outcome of the permutation is completely random). Dumais et al. also proved 
that Alice can use the transformation U that changes the value of a committed 
bit to efficiently invert the one-way permutation characterizing the computa- 
tional bindingness of the protocol. It is important to remark that the definition 
of bindingness for quantum bit commitments differs from the one used for clas- 
sical protocols. To require that the probability that Alice can successfully open 
two different commitments goes to zero is a too strong requirement for quantum 
protocols, since Alice can always commit to a superposition of two valid com- 
mitments. Thus, it was suggested in [10] to classify as binding, any quantum bit 
commitment protocol where the probability of successfully opening zero and one 
sum up to a value arbitrarily close to one in a security parameter. Therefore, 
we slightly change our definition of weak commitments of type II for quantum 
protocols. 

Definition 5. A quantum {a, (3) — qbc{b) of type II, {a, j3) — qbcu{b) for short, 
is defined as a pair of algorithms (3 — qCommitu{b) and a — qOpenu{b). In an 
a — qOpenii{b) algorithm the committer is able to commit to something and later 
on unveil 6=0 and 6=1 with success probabilities which sum up to 1-1- a. In a 
j3 — qCommitii{b) algorithm. Bob, the receiver, learns the value of a committed 
bit with probability at most (3. 



Thus, if a is negligible in a security parameter, our quantum weak commit- 
ment is binding. A quantum commitment is concealing if lim Trjpo ~ Pi| = 0) 

m—^oo 

where Tr is the trace of a matrix, po is the density matrix which represents a 
commitment to zero, p\ is the density matrix which represents a commitment 
to one and m is a security parameter. 

It was noted in [10] that if the photon source used in the protocol implementa- 
tion is not a perfect one, i.e. if it emits multiple-photons with some probability, 
the protocol is not concealing anymore. Actually, it is easy to see that, given 
a non-perfect photon source, the probability that Bob breaks the protocol ap- 
proaches one as the security factor n increases. Finding a protocol robust against 
multiple-photons was stated as an open problem in [10]. We use the theory of 
weak commitments to present a quantum bit commitment protocol based on any 
quantum one-way permutation robust against multiple-photons. We model a re- 
alistic source which emits multiple photons with probability p by an oracle that 
announces the value of the committed bit to Bob (before the unveiling phase) 
with probability 1 — (1 — p)". Therefore, we have a {a, (3) — qbcu where a is 
negligible and /3=1 — (1— p)”. 

Before applying our reductions, we have to prove that using repeatedly the 
above protocol does not compromise its security. In the following, we denote Du- 
mais et. al. quantum bit commitment protocol performed m times with perfect 
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apparatus by qbc{bi...bm) where bi,l < i < m are the bits committed to in 
each execution of the protocol. 

Proposition 7. Bob cannot distinguish between two commitments qbc{bi...bm) 
and qbc{wi...Wra) where bi...bm and Wi...Wm are any m—bits long strings. 

Proof. For an honest Alice the individual bit commitments in the string com- 
mitment are performed independently where each of the states resulting from 
the individual commitments is on Bob’s side independent of the committed bit. 
Then the product state which is the result of the string commitment can easily 
be computed to be independent of the string committed. The bit string commit- 
ment is perfectly concealing. 

Proposition 8. Let Pq and pi be the probabilities that Alice can successfully 
open two different commitments qbc{bi...bm) and qbc{wi...Wm) to Bob respec- 
tively. We have that lim„_>oo (Pq + Pi) = 1 where n is the dimension of the 
image of the one-way permutation an. 

Proof. We will show that a cheating strategy for bit string commitment based 
on the protocol of [10] would imply a cheating strategy for one single application 
of the protocol which is equivalent to inverting the quantum one way function. 
Assume Alice is able to change at least one commitment within a string of length 
m with non negligible probability then Alice can instead of performing a single 
execution of the protocol of [10] perform a string commitment but send only one 
of the commitments (which she can change with a non negligible probability) to 
Bob. As Bob cannot detect if a bit Alice committed to is part of a string this 
strategy allows Alice to cheat in a single round of the protocol. 

One more point is missing, could the information provided by the oracle help 
Bob breaking commitments where no multiple-photon is emitted? This is not 
the case, since the output of the one-way permutation is completely random and 
therefore makes each partial commitment qbc{bi) independent of the others. In 
other words, the density matrix of each unbroken partial commitment qbc{bi) 
does not change when other partial commitments are broken. Therefore, we are 
able to use our reduction techniques in order to cope with the multiple-photons 
problem in the scheme above described. 

We ask Alice to use the same reduction which was used in Proposition 2: when 
the committed bit is b, Alice chooses a random m— bit word such that 

6 = 5i © 62 © ••• © bm where © stands for the XOR operation. Alice sequentially 
commits to 6^,1 < i < m using a /? — qCommitu{b) protocol. By applying 
the reduction presented in Proposition 5 (which is basically a straightforward 
application of privacy amplification [1]), and Propositions 7 and 8 we see that 
it is possible to have a binding (a is negligible in a security parameter) and 
concealing ( lim Tr |/?o — Pi | =0) quantum bit commitment protocol based on 

m—^oo 

a quantum one-way permutation even when used with an imperfect source of 
photons. We omit the proof of the next theorem. 
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Theorem 5. There exists a quantum hit commitment protocol based on any 
quantum one-way permutation which is binding and concealing even in the pres- 
ence of multiple photons. 

Satisfactory as this result is, it has to be noted that our achieved rate goes 
to zero in the limit of large n. It would be desirable to have a reduction which 
achieves a non-trivial rate. However, the reductions achieving non-trivial rates 
presented in this paper do not trivially generalize to the quantum world. One 
of the reasons is that, it is not completely obvious how to define the binding 
condition for quantum string commitments, since Alice can commit to superpo- 
sition of commitments to different strings. Although, recently, in a pre-print [5] , 
Crepeau, Dumais, Mayers and Smith proposed a reasonable definition for the 
binding condition of quantum string commitments, the fact that Dumais et. al. 
protocol satisfies this definition has yet to be proven (it was conjectured to be 
true in [5]). If the conjectures of [5] are proven to be correct, our non-trivial 
reductions should apply for Dumais et. al. protocol and non-trivial rates can be 
achieved. However, to prove these statements is beyond the scope of this paper. 

7 Conclusions 

We introduced a new class of primitives called weak commitments. We com- 
pletely characterized when bit commitments can be reduced to these primitives. 
To judge the efficiency of such a reduction we used the new concept of the rate 
of a reduction. The reductions presented exhibit a non-zero rate. Furthermore, 
we provided examples of how to implement these primitives based on oblivious 
transfer and on quantum mechanics. Several open questions were stated. 

We thank an anonymous referee who pointed out a mistake in the proof of 
Theorem 3. This research was partially funded by project PROSECCO of the 
IST-FET programme of the EC and by a project on Research and Development 
on Quantum Cryptographyh of Telecommunications Advancement Organization 
as part of the programme Research and Development on Quantum Communi- 
cation Technologyh of the Ministry of Public Management, Home Affairs, Posts 
and Telecommunications of Japan. 
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Abstract. Secure authentication frequently depends on the correct 
recognition of a user’s public key. When there is no certificate authority, 
this key is obtained from other users using a web of trust. If users can be 
malicious, trusting the key information they provide is risky. Previous 
work has suggested the use of redundancy to improve the trustworthiness 
of user-provided key information. In this paper, we address two issues 
not previously considered. First, we solve the problem of users who claim 
multiple, false identities, or who possess multiple keys. Secondly, we show 
that conflicting certihcate information can be exploited to improve trust- 
worthiness. Our methods are demonstrated on both real and synthetic 
PGP keyrings, and their performance is discussed. 



1 Introduction 

Authentication is one of the most important objectives in information security. 
Public key cryptography is a common means of providing authentication. Some 
examples are X.509 [19] and PGP [20]. In the public key infrastructure, each user 
is associated with a public key, which is publicly available, and with a private 
key, which is kept secret by the user. A user signs something with her private 
key, and this signature can be authenticated using the user’s public key. 

The ability to exchange public keys securely is an essential requirement in 
this approach. Certificates are considered to be a good way to deliver public 
keys, and are popularly used in today’s public key infrastructures. Intuitively, 
a certificate is an authority telling about a user’s public key. In a hierarchical 
system, such as X.509 [19], we usually assume that the certificates contain true 
information, because the authority is secured and trusted. In non-hierarchical 
systems, such as PGP keyrings [20], each user becomes an authority. Such sys- 
tems are referred to as webs of trust. With webs of trust, it may be risky to 
expect all certificates to contain true information, because not all users are fully 
secured and trusted. Accepting a false public key (i.e., believing it contains true 
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DAAD19-02-1-0219, and by the National Science Foundation under grant CCR- 
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information) undermines the foundation of authentication. For this reason, a 
method that can be used to verify a user’s public key (i.e., to detect and reject 
false certificates) is very much needed. 

The goal of this paper is to develop a robust scheme to determine if a cer- 
tificate is trustworthy. Our method uses redundant information to confirm the 
trustworthiness of a certificate. Previous work [16] has shown that redundancy 
(in the form of multiple, independent certificate chains) can be used to enhance 
trustworthiness. We show that in some circumstances multiple certificate chains 
do not provide sufficient redundancy. This is because a single malicious user may 
(legitimately) possess multiple public keys, or (falsely) claim multiple identities, 
and therefore can create multiple certificate chains which seem to be indepen- 
dent. Our solution to this problem is to also consider identities when determining 
if certificate chains are independent. 

In addition, we investigate the implications of conflicting certificates (i.e., 
certificates which disagree about the public key of a user). Conflicts are simple 
to detect. We show that such conflicting information can be used to help identify 
malicious users. Based on that information, the number of certificates which can 
be proved to be true is increased, improving the performance of our method. 

The organization of the paper is as follows. Section 2 summarizes related 
work. Section 3 defines our problem precisely. Section 4 presents solutions with- 
out using conflicting certificate information. Section 5 shows how conflicting 
certificates information may be used to improve performance. Section 6 presents 
our experimental results, and section 7 concludes. 



2 Related Work 

In addition to X.509 and PGP, discussed above, there are other public key infras- 
tructures, such as SPKI/SDSI [6] and PolicyMaker [3]. These mainly focus on 
access control issues. They differ from X.509 and PGP in that they bind access 
control policies directly to public keys, instead of to identities. 

Existing work on improving the trustworthiness of webs of trust can be clas- 
sified into two categories. In the first category, partial trust is used to determine 
trustworthiness of a target public key. In [18] and [13], this trustworthiness is 
based on the trust value in a single certificate chain. Multiple certificate chains 
are used in [2] and [11] to boost confidence in the result. [9] tries to reach a 
consensus by combining different beliefs and uncertainties. In [15], insurance, 
which may be viewed as a means of reducing risk, is used to calculate the trust- 
worthiness of a target public key. 

In the second category of methods, there is no partial trust; a key is either 
fully trustworthy, or else it is untrustworthy. In this category, an upper bound on 
the number of participants that may be malicious is assumed. An example is [4], 
which requires a bound on the minimum network connectivity in order to reach 
a consensus. Another important method in this category is [16]. This method 
suggests using multiple public key-independent certificate chains to certify the 
same key. The authors showed that if at most n public keys are compromised, a 
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public key is provably correct if there are n + 1 or more public key-independent 
certificate chains certifying it. Computation of the number of indepedent chains 
is accomplished by computing network flow in the certificate graph. 

Methods in the first category (partial trust) are based on probabilistic mod- 
els. We believe such models are more appropriate for computing the reliability 
of faulty systems than they are for computing trustworthiness of information 
provided by users. For example, such approaches require proper estimation of 
the trustworthiness of each user. If such estimates are incorrect, or a trusted user 
is compromised, then the output produced by these methods will be misleading. 
We believe methods in the second category are more suitable for the case of 
(intentionally) malicious users. That is, it should be much easier to bound the 
number of users who are malicious than to specify how trustworthy each user is. 

A limitation of methods in the second category is that the importance of 
identities, as well as public keys, has not been fully considered. That is, these 
methods have not considered the possibility that each user may claim multiple 
identities, or possess multiple public keys. 

All methods for distributed trust computation assume there is some initial 
trust between selected users. Without such initial trust, there is no basis for any 
users to develop trust in one another. In [5], it is shown that forging multiple 
identities is always possible in a decentralized system. We assume that the initial 
trust must be negotiated in an out-of-band way (such as by direct connection, or 
communication with a trusted third party) from the distribution of trust, and 
that proof of identity is available during this initial phase. 

The next section presents definitions and assumptions, and a statement of 
the problems to be solved. 

3 Problem Statement 

A user is an entity in our system represented by an identity (such as the names 
“Bob” and “Alice”). An identity must be established when the initial trust in- 
formation is negotiated between users. We assume in this work that each user 
legitimately has exactly one, unique true (or valid) identity. An identity which 
does not belong to a real user is a false identity. 

We further assume each user can have, or be associated with, one or more 
public keys. In the case where a user has more than one public key, we assume the 
user further specifies each of her keys by a key index number. The combination 
of a user identity x and a key index number j is denoted x/j, and uniquely 
identifies a true public key. If the user with identity x only has a single public 
key, j will be omitted for the sake of convenience. 

Our definitions of public key certificate and certificate chain follow [14]. A 
public key certificate is a triple (x/j, k,Sk'), where x is an identity, j is a key index 
number, fc is a public key, and Sk' is the digital signature over the combination 
of x/j and k. Given a certificate C = (x/j, fc, Sk>), if (i) the identity x is a true 
identity, and (ii) the user with identity x says fc is her public key, then C is a 
true certificate and fc is a true public key for x. Otherwise, C is a false certificate 
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and A: is a false public key for x. If Sk' is generated by y, we say the certificate 
is issued by y. If all certificates issued by y are true certificates, then y is a good 
user. If there exists at least one false certificate issued by y, then y is a malicious 
user. 

Two certificates are said to agree with each other if the identities, key index 
numbers and public keys are the same. Two certificates are called conflicting 
certificates if the identities and key index numbers are the same, but the public 
keys are different. Note that the two conflicting certificates may both be true 
by our definition (i.e., the user with the corresponding identity says both of the 
two keys are her public keys, with the same index number). This may happen 
when a user x intentionally has two conflicting certificates issued to herself, by 
two separate parties, for the purpose of possessing more public keys. We expand 
our definition of a malicious user to also include x in such a case; the issuers of 
the conflicting certificates, however, are not considered to be malicious on this 
count, and the certificates are defined to be true. 

Each user x may accumulate a set of certificates about other users. Ob- 
taining these certificates may be done in a variety of ways. We do not discuss 
further in this paper how certificates are distributed, which is an open problem. 

In each user’s set of certificates, some of them are assumed by x to be true and 
others are not. Denote Tq the set of certificates assumed by x to be true initially 
(i.e., they are provided by means of the initial trust distribution). Because this 
initial trust information is assumed to be true, the signatures on the certificates 
in Tq do not have to be further verified. A certificate chain is a sequence of 
certificates where: 

1. the starting certificate, which is called the tail certificate, is assumed or 
determined to be true; 

2. each certificate contains a public key that can be used to verify the digital 
signature associated with the next certificate in the sequence; and, 

3. the ending certificate, which is called the head certificate, contains a desired 
name-to-key binding, which is called the target. 

Each user x’s set of certificates may be represented by a directed certificate 
graph G^{V^, E^). and denote the set of vertexes and the set of edges in 
the certificate graph , respectively. A vertex in represents a public key and 
an edge in E^ represents a certificate. There is a directed edge labeled with y/j 
from vertex k' to vertex k in if and only if there is a certificate {y/j, k, Sk') 
in R^. A certificate chain is represented by a directed path in the certificate 
graph. Two conflicting certificates are represented by two edges with the same 
label, but different head vertexes. In this case, the two different head vertexes 
are called conflicting vertexes. 

For the sake of simplicity, we add to the certificate graph an “oracle” vertex 
fcp. There is a directed edge in from k^ to every key which is assumed to be 
true (by way of the initial trust distribution), labeled with the identity/index 
number bound to that key. 

To depict true and false public keys in the certificate graph, we “paint” 
vertexes with different colors. A white vertex represents a public key that is 
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Fig. 1. User x’s certificate graph 



either assumed or determined to be true for the identity on the edges directed 
towards that vertex. A dark gray vertex represents a public key which is known 
to be false for the corresponding identity. If a public key is neither assumed to 
be true nor proved (yet) to be true for the corresponding identity, we paint it 
light gray. 

Figure 1 is a sample certificate graph, kg is the oracle vertex, ki, ^2 and kg 
are three public keys that are assumed to be true by user x. k\ and k 2 are two 
conflicting vertexes because the labels on their incoming edges are the same. 
kg is z’s public key, with index number 1, and k^ is ^’s public key, with index 
number 2. kg \s & false public key. 

When there is more than one malicious user, there may be a relationship 
between these users. Two malicious users who cooperate with each other to 
falsify information are said to be colluding. We say that two users x and y are 
colluding if either: (i) there exist two false certificates, one issued by x and one 
by y, and they agree with each other; or, (ii) x issues a false certificate upon y’s 
request, or vice versa. 

3.1 Problem Description 

We now define the problems to be solved. First we consider the case in which 
malicious users do not collude, followed by the colluding case. The goals are the 
same, that is, to maximize the number of certificates which can be proved to be 
true. 

Problem 1 Given a set of certificates and a set of true certificates. 

Assuming there are at most n malicious users, and these users do not collude, 
maximize the number of certificates which can be proved to be true. 

Problem 2 Given a set R^ of certificates and a set Tg G R^ of true certificates. 
Assuming there are at most n malicious users and these users may collude, 
maximize the number of certificates which can be proved to be true. 

It is necessary to have a metric to evaluate the performance of proposed 
solutions for the above problems. Let U be the set of all users. Denote by Kf 
the set of true public keys that can be reached by at least one path from the 
oracle vertex kg in the certificate graph for user x. is the maximum set of true 
public keys that any method could determine to be true by means of certificate 
chains in this graph. Let Kg be the set of public keys that are assumed to be 
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true initially by user x, and the set of public keys that are determined to be 
true by a method M. Ideally, would be equal to but in practice 

this may be difficult to achieve. The metric definition now follows. 



Definition 1 Given a solution s to Problems 1 or 2 generated by a method M . 
The performance qx{s), i.e. s’s performance for user x, is 



qx{s) 



\K-\ 



To capture the performance for a set of users, we propose using the weighted 
average of each user’s performance: 



q{s) 



x&U 

E 

xeu 



4 Solutions to Problems 1 and 2 



In this section, we present methods for solving problems 1 and 2 under two 
assumptions: (a) when there is only one public key per identity, and (b) when 
there may be multiple public keys per identity. We assume there is an upper 
bound on the number of users who may be malicious, and use redundancy to 
determine the certificates that must be true. We ignore in this section the case 
in which there are conflicting certificates, which is considered in section 5. 

Two certificate chains are public key-independent if their head certificates 
agree, and their remaining certificates have no public keys in common. Two 
certificate chains are identity-independent if their head certificates agree, and 
their remaining certificates have no identities in common. We state the following 
theorems without proof (see [8]). 

Theorem 1 Given two identity-independent certificate chains and any number 
of non-colluding malicious users, the head certificates must be true. 

In the case of multiple colluding malicious users, a greater degree of redundancy 
is needed to verify a certificate is true: 

Theorem 2 Given n -I- 1 identity-independent certificate chains, if there are at 
most n colluding malicious users, then the head certificates must be true. 

Based on these results, we now present methods for maximizing the number 
of true certificates (when there are no conflicting certificates). 
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4.1 Maximum of One Public Key per Identity 

Reiter and Stubblebine [16] considered and solved this problem. We summarize 
their results for the convenience of the reader. When each identity corresponds to 
only one public key, public key-independent certificate chains are also identity- 
independent. In the certificate graph, public key-independent certificate chains 
corresponds to paths with the same tail vertex and the same head vertex, but 
which are otherwise vertex disjoint. For each vertex kt, it is possible to use 
standard algorithms for solving maximum network flow [1] in a unit capacity 
network to find the maximum number of vertex-disjoint paths from feg to kt, in 
running time 0(|R||if|). 

It can be shown that all certificates (edges) ending at kt must be true if: 
(i) there are any number of non-colluding malicious nodes, and the maximum 
flow from ko to kt is greater than or equal to 2; or, (ii) the number of (possibly 
colluding) malicious users is no greater than n and the maximum flow from ko 
to kt is greater than or equal to n -I- 1. 



4.2 Multiple Public Keys Allowed per Ideutity 

We now address the case in which an identity may be associated with multiple 
public keys in cc’s certificate graph. For example, in figure 1 there are two vertex- 
disjoint paths from ko to k^. However, user z has two public keys, and the 
maximum number of identity-independent paths to ke is only 1 . Therefore, it is 
not safe to conclude ke is u’s true key if z may be malicious. This issue has not 
previously been addressed. 

We still wish to use the notion of redundant, identity-independent paths to 
nullify the impact of malicious users. To ensure that two paths are identity- 
independent under the new assumption, it is necessary that the two paths have 
no label in common on their edges. In this case the paths are said to be label- 
disjoint. 

Suppose there exists a solution to the problem of determining the maximum 
label-disjoint network flow in the graph with unit capacity edges, from kg to 
a vertex kt- We conclude (by the reasoning previously given) that kt is a true 
public key for the identity on the edges in the maximum flow ending at kt if: 

1. the maximum flow is 2 or greater, and there is at most 1 malicious node, or 
any number of non-colluding malicious users; or, 

2. the maximum flow is n -I- 1 or greater, and there are at most n colluding 
malicious users. 

We now state a theorem about the complexity of finding the maximum label- 
disjoint network flow in a directed graph: 

Theorem 3 Given a certificate graph and a vertex kt- If one identity may 
legitimately be bound to multiple public keys, the problem of finding the maximum 
number of label disjoint paths in G^ from ko to kt is NP-Gomplete. 
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The proof can be found in [8]. Therefore, to solve this problem exactly is com- 
putationally expensive in the worst case. 

A heuristic for this problem follows an idea from [16]. The problem of finding 
the maximum number of label-disjoint paths between ko and kt can be trans- 
formed to the maximum independent set {MIS)[7] problem. The MIS problem 
is defined as follows: Given a graph G, find the maximum subset of vertexes of 
G such that no two vertexes are joined by an edge. The transformation from 
our problem is trivial, and can be found in [8]. The size of the maximum inde- 
pendent set in the transformed graph is the maximum number of label-disjoint 
paths from kg to kt in G^. Although MIS is also a NP-complete problem, there 
exist several well-known approximation algorithms for it [10]. 

Alternatively, an exact solution may be computationally tractable if the re- 
quired number of label-disjoint paths is small. Suppose we wish to solve the 
problem of whether there exists at least b label-disjoint paths in G^ from ko to 
kt- The maximum number of label-disjoint paths from ko to kt equals the size 
of the minimum label- cut for ko,kt- A label-cut is a set of labels on edges whose 
removal would disconnect kt from ko- If we enumerate all subsets of labels in 
G^ with b or fewer labels, and no label-cut with b or fewer labels exists, then kt 
is determined to be true. The algorithm runs in 0(|if||y|^) (proof omitted). 

In this section, we solved problems 1 and 2, without considering the possi- 
bility of conflicting certificates. We now turn to this problem. 

5 Dealing with Conflicting Certiflcates 

We assume that conflicting certificates occur because of malicious intent, and 
not by accident. A malicious user may create conflicting certiflcates for several 
reasons. For example, one use is to attempt to fool user x into believing a false 
public key is true, by creating multiple public key-independent certificate chains 
to the false public key. In this case, the method of section 4.2 can first be applied 
to determine the set of true certiflcates. 

However, we can exploit the existence of conflicting certiflcates to prove an 
even larger number of certiflcates must be true. Stubblebine and Reiter [16] 
pointed out that conflicting certiflcates represent important information, but 
did not suggest how they could be used. We propose below a method of doing 
so, based on the notion of the suspect set: 

Definition 2 A suspect set is a set of identities that contains at least one ma- 
licious user. A member of the suspect set is called a suspect. 

We now describe how to construct suspect sets of minimum size, and how they 
can be used to determine more true certificates. 

5.1 Constructing Suspect Sets (Non-colluding Malicious Users) 

Suppose we know or have determined a certificate is false by some means. If 
there is only a single malicious user, or multiple malicious users who are non- 
colluding, the true identity of the issuer of this false certificate must appear in 
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every chain ending with a certificate that agrees with this false certificate. Using 
this insight, we propose to construct suspect sets using the following algorithm. 
The algorithm takes a certificate graph as input. 



Algorithm 1 constructing suspect sets 
1: for each label y/j in the certificate graph do 

2: for each dark gray vertex ki whose incoming edge has a label y/j do 

3: construct a new suspect set consisting of the set of labels, each of which is a 

label-cut for ko,ki. 

4: end for 

5: for each light gray vertex ki with an incoming edge labeled y/j, conflicting with 

a white vertex with an incoming edge labeled y/j do 
6: construct a new snspect set consisting of {the set of labels each of which is a 

label-cut for ko,ki} U y 

7: end for 

8: for each pair ki, kh of light gray vertexes whose incoming edges both have a label 

y/j do 

9: construct a new snspect set consisting of (the set of labels each of which is a 

label-cut for either fco, ki or ko,kh\ Uy 

10: end for 

11: end for 



The intuition behind algorithm 1 is as follows. For each certificate known 
to be false, the malicious user’s true identity must be a label-cut between ka 
and the false certificate. For each certificate conflicting with a true certificate, a 
malicious user has either purposely requested the conflicting certificate be issued 
to herself, or has issued the conflicting certificate. For each pair of conflicting 
but undetermined certificates (neither known to be true), any of either, both, or 
neither being true must be considered possible. The complexity of algorithm 1 
is 0(|Up|if|) (proof omitted). We now explain how suspect set information can 
be useful. 

5.2 Exploiting Suspect Sets (Non-colluding Malicious Users) 

Consider the case where there is a single malicious user. Let Lg represent the 
intersection of all the suspect sets generated by algorithm 1. Clearly the single 
malicious user’s identity is in Lg. If, on the other hand, there may be up to 
b non-colluding malicious users, we must determine the maximum disjoint sets 
(MDS) from all the suspect sets generated by algorithm 1. Two sets are called 
disjoint if they do not have any members (labels) in common. Suppose a solution 
to MDS consists of m suspect sets, and m = b. Let be the union of these 
m sets. It is clear that all b malicious users must be in Lm- 

Unfortunately, MDS is also NP-Complete, by transformation from the max- 
imum independent set problem (proof omitted). As a result, the solution may 
only be approximated. 
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Given Lg or L^n, we can determine more certificates are true as follows. For 
each undetermined public key kt, if there is only one malicious user, we simply 
test if any single label in Lg is a label-cut for k^, kt in the certificate graph. If 
not, kt is determined to be true. If there are multiple non-colluding malicious 
users, we simply test if any single label in Lm is a label-cut for ko,kt- If not, kt 
is determined to be true. For this computation, a modified breadth-first search 
suffices, with a complexity of 0{b- |y||if|). 



5.3 Suspect Sets (Multiple Colluding Malicious Users) 

In the case of multiple colluding malicious users, we propose to use the following 
rules to construct suspect sets. These rules are presented in decreasing order of 
priority. 

Suspect set rule 1 Given a certificate chain whose head certificate is false, 
construct a new suspect set that contains all the identities (except the identity in 
the head certificate) in the certificates of the chain. 

Suspect set rule 2 Given two certificate chains whose head certificates are 
conflicting with each other, if one of the head certificates is true and the other is 
undetermined, construct a new suspect set that contains all the identities in the 
certificates of the chain whose head certificate is undetermined. 

Suspect set rule 3 Given two certificate chains whose head certificates are 
conflicting with each other, if both head certificates are undetermined, construct 
a new suspect set that contains all the identities in the certificates of the two 
chains. 

We do not describe an algorithm that implements these rules, due to space 
limitations. The algorithm is straightforward, and the rules are applied in order. 
To make use of the suspect sets constructed by these rules, we try to find the 
maximum number of disjoint sets (MDS), from all the suspect sets. 

Suppose the number of maximum disjoint sets is found to be a. Let Lc be 
the union of the a sets. It is clear that at least a malicious users are included in 
Lc. Next, all the edges with a label in Lc are deleted from the certificate graph. 
By doing this, the maximum number of malicious users with certificates in the 
certificate graph is reduced from b to no greater than b — a. In this case. Theorem 
2 can be applied to determine if the rest of the undetermined certificates are true, 
as follows. For each target public key kt, if there exist b—a+1 label-disjoint paths 
between ko and kt, the head certificates of these paths are true. The algorithm 
described previously for computing the minimum label-cut can be used to solve 
this problem. 

We now present experimental evidence about the benefits of using certificate 
conflicts. 
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Table 1. Performance for PGP keyring before and after conflict detection, for a total 
of 10,000 public keys 



# of colluding malicious users 


1 


2 


3 


4 


■jf of provably true certificates before conflict detection 


78 


54 


40 


33 


if of provably true certificates after conflict detection 


9992 


77 


53 


40 



6 Experimental Results 

We implemented and tested our conflict detection method to investigate its 
practicality and benefits. These experiments only considered the case of one 
legitimate public key per user. We emulated “typical” malicious user behavior, 
in order to contrast the performance before and after conflict detection. 

For test purposes, we used actual PGP keyrings. These were downloaded 
from several PGP keyservers, and the keyanalyze [17] tool was used to extract 
strongly-connected components. In addition, we synthetically generated keyrings 
to have a larger number of test cases. The synthetic data was generated by the 
graph generator BRITE [12]. We used the default configuration file for Barabasi 
graphs, which we believe are similar to actual keyrings. The undirected graphs 
generated by BRITE were converted to directed graphs by replacing each undi- 
rected edge with two directed edges, one in each direction. The number of ver- 
texes in each synthetic key ring was set to 100. For each data point we report, 50 
problem instances were generated (using a different random number generator 
seed each time); the values plotted are the average of these 50 instances. 

The first experiment compares our method with the method of [16] on one 
of the largest PGP keyrings. The graph of this keyring contains 15956 vertexes 
(users) and 100292 edges (certificates). For this PGP keyring we emulated the 
behavior of n colluding malicious user as follows. First we randomly picked a 
target, and then n malicious users were randomly selected. The n malicious 
users issued n false certificates, one per malicious user, each binding the target’s 
identity to the same false public key. 

After emulating this behavior, we applied the method of [16] to determine the 
maximum set of true certificates. The resulting performance is the performance 
before conflict detection. Then we applied the suspect rules of colluding malicious 
users (from section 5.3) to And many suspect sets, from which we constructed 
Lc- For each of the remaining undetermined public keys, we made use of L^. and 
the method of section 5.3 to determine if it was true. The resulting performance 
is labeled the performance after conflict detection. Table 1 shows the results. 

For this very large PGP keyring, it is not practical to evaluate the perfor- 
mance for all users. Instead, we randomly picked 200 users. For each user, we 
randomly selected 50 public keys on which to test our method. Each user’s cer- 
tificate graph was the entire keyring. The figure shows how many public keys can 
be determined to be true when there are different numbers of malicious users. 
This figure shows that when there is only a single malicious user, performance 
is greatly improved (by two orders of magnitude) with the use of conflict detec- 
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tion. In these experiments, the suspect sets turned out to be quite small. The 
performance with conflict detection dropped dramatically, however, for the case 
of two or more colluding malicious users, because of lack of sufficient redundancy 
in the certificate graph. 

For our second experiment, we synthetically generated PGP keyrings. We 
emulated a single malicious user’s behavior, as follows. We randomly picked a 
target, a malicious user, and n certificate issuers. The malicious user was assumed 
to ask the n certificate issuers to certify n different public keys for herself. Using 
these n different public keys, the malicious user created n certificates, one per key, 
each binding the target’s identity to the same false public key. After emulating 
this behavior, we applied the algorithm for computing the minimum label-cut 
for the case of 6 = 2, and determined the maximum set of true certificates. 
The resulting performance was the performance before conflict detection. Then 
we applied algorithm 1 to And many suspect sets, from which we constructed 
Lg. For each of the remaining undetermined public keys, we made use of Lg 
and the method of section 5.1 to try to determine if it was true. This gave 
the performance after conflict detection. Each test was run 50 times to obtain 
averages. The results were: 



— Performance before conflict detection was 2%, regardless of the number of 
false certificates. 

— Performance after conflict detection steadily increased from 4% (with 1 false 
certificate) up to 11% (with 19 false certificates). 

The malicious user is faced with a dilemma (fortunately!). While increasing 
the number of false certificates should increase uncertainty about keys, it also 
becomes easier to narrow the list of “suspicious” users, thereby limiting the scope 
of the damage the malicious user can cause. 
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Fig. 2. Performance comparison for PGP keyrings before and after conflict detection 
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Our final experiment was for real PGP keyrings, and demonstrates how per- 
formance is improved by conflict detection. The results are shown in Figure 2. 
For all cases, performance increased when conflict detection was used. 

In each PGP key ring, we emulated a single malicious user’s behavior as fol- 
lows. We randomly picked a target, a malicious user, and two certificate issuers. 
Then the malicious user asked for two different public keys, certified by the two 
certificate issuers. Using these two public keys, the malicious user created two 
public key-independent certificate chains to the target. After emulating this be- 
havior, we again applied the method for determining if a label-cut of size b of 
less exists, for & = 2, to determine the maximum set of true certificates. The re- 
sulting performance is the performance before conflict detection. Then we used 
the method of section 5.3) to obtain the performance after conflict detection. 

All experiments were performed on a Pentium IV, 2.0GHZ PG with 5I2MB 
memory. Running times for figure 2 ranged from 5 to 30 seconds, except for 
the graph with 588 vertexes, which required 10 hours of GPU time. We believe 
analysis of keyrings for robustness will be done infrequently, in which case these 
execution times should be acceptable. 



7 Conclusion and Future Work 

In this paper, we described the problem of proving certificates are true in webs of 
trust (such as PGP Keyrings). This is a difficult problem because malicious users 
may falsify information. Under the assumption that users may legitimately have 
multiple public keys, we showed that redundant identity-independent certificate 
chains are necessary. Previous methods based on public key-independent chains 
are not sufficient under this assumption. 

In the case that certificate conflicts are detected, it is possible to exclude 
certain users from the set of possible malicious users. This allows additional 
certificates to be proved true. Experimental results demonstrated that (a) the 
web of trust is seriously degraded as the number of malicious users increases, 
and (b) the use of conflict detection and redundant certificates substantially 
improves the ability to prove certificates are true. 

Our results show that current PGP keyrings are not particularly resistant 
to attacks by malicious users, particularly colluding users. We are currently 
investigating ways to increase the robustness of webs of trust, such as PGP 
keyrings. 
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Abstract. In smartcard encryption and signature applications, ran- 
domized algorithms can be used to increase tamper resistance against 
attacks based on averaging data-dependent power or EMR variations. 
Oswald and Aigner describe such an algorithm for point multiplication 
in elliptic curve cryptography (ECC). Assuming an attacker can 
identify and distinguish additions and doublings during a single point 
multiplication, it is shown that the algorithm is insecure for repeated 
use of the same secret key without blinding of that key. Thus blinding 
should still be used or great care taken to minimise the differences 
between point additions and doublings. 

Keywords:Addition-subtraction chains, randomized exponentiation, el- 
liptic curve cryptography, ECC, point multiplication, power analysis, 
SPA, DPA, SEMA, DEMA, blinding, smartcard. 



1 Introduction 

Side channel attacks [6,7] on embedded cryptographic systems show that sub- 
stantial data about secret keys can leak from a single application of a crypto- 
graphic function through data-dependent power variation and electro-magnetic 
radiation [12,13]. This is particularly true for crypto-systems which use the com- 
putationally expensive function of exponentiation, such as RSA, ECC and Diffie- 
Hellman. Early attacks required averaging over a number of exponentiations [9] 
to extract meaningful data, but improved techniques mean that single exponen- 
tiations using traditional algorithms may be insecure. In particular, it should be 
assumed that the pattern of squares and multiplies can be extracted fairly accu- 
rately from side channel leakage, perhaps by using Hamming weights to identify 
operand re-use. Where the standard binary “square-and-multiply” algorithm is 
used, this pattern reveals the secret exponent immediately. 

In this context, Oswald and Aigner proposed a randomized point multiplica- 
tion algorithm [10] for which there is no bijection between scalar key values and 
sequences of curve operations. They randomly switch to a different procedure 
for which multiplications appear to occur instead for zero bits but not for one 
bits. This alternative corresponds to a standard recoding of the input bits to 
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remove long sequences of Is and introduces other non-zero digits such as 1. On 
the one hand, the pattern of squares and multiplications is no longer fixed, so 
that averaging power traces from several exponentiations does not make sense, 
and, on the other hand, there is ambiguity about which digit value is associated 
with each multiplication. 

This article analyses the set of randomized traces that would be generated by 
repeated re-use of the same unblinded key k. By aligning corresponding doublings 
in a number of traces, the possible operation sequences associated with bit pairs 
and bit triples of the secret key k can be extracted. With only a few traces 
(ten or so) this provides enough information to determine half the bits of k 
unequivocally, and the rest with a very high degree of certainty. 

Previous work in this area includes [11] and [14] . In [11] Oswald takes a similar 
but deterministic algorithm and shows how to determine a space of possible keys 
from one sequence of curve operations, but not how to combine such results 
from different sequences. Here randomization minimises the inter-dependence 
between consecutive operations and so it is unclear whether or not her techniques 
lead to an intractable amount of computing. Okeya & Sakurai [14] treat the 
simple version of the randomized algorithm and succeed in combining results 
from different multiplications by the same key. They require the key k to be re- 
used 100-1- log 2 k times. Here we treat the more complex version of the algorithm 
in an extended form which might increase security. The analysis of Okeya & 
Sakurai is inapplicable here because it depends on a fixed finite automaton state 
occurring after processing a zero bit. However, using new methods we find that a) 
measurements from only 0(10) uses of the secret key reveal the key by applying 
theory which considers pairs of bits at a time, b) software which considers longer 
sequences of bits can process just two uses to obtain the key in 0(log k) time, 
and c) for standard key lengths and perfect identification of adds and doubles, 
a single use will disclose the key in a tractable amount of time. In addition, our 
attack seems less susceptible to error: key bits are deduced independently so 
that any incorrect deductions affect at most the neighbouring one or two bits. 
In comparison, the attack of Okeya & Sakurai recovers bits sequentially, making 
recovery from errors more complex. 

Although only one algorithm is studied here, a similar overall approach can 
be used to break most randomized recoding procedures under the same condi- 
tions. The two main properties required are: i) after a given sequence of point 
operations, the unprocessed part k' of the key can only have one of a small, 
bounded number of possible values (determined from k by the length of the 
operation sequence but independent of other choices); and ii) it is possible to 
identify an associated subset of trace suffixes for which all members correspond 
to the same value of k' . These also hold for the algorithm proposed by Liardet 
& Smart [8], which uses a sliding window of random, variable width. They seem 
to be the key properties required in [16] to demonstrate similar weaknesses in 
that algorithm. 

Several counter-measures exist against this type of attack. As well as stan- 
dard blinding by adding a random multiple of the group order to the exponent. 
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different algorithms can be employed, such as [3,5]. Moreover, formulae for point 
additions and doublings can be made indistinguishable [1,2,4, 8]. 

2 The Oswald- Aigner Exponentiation Algorithm 

This section contains a brief outline of the Oswald- Aigner algorithm [10] in terms 
of the additive group of points on an elliptic curve E. Rational integers are 
written in lowercase while points on the curve are written in capitals and Greek 
characters denote probabilities. The algorithm computes the point Q = kP for 
a given positive integer k (the secret key) and a given point P on E. 




Fig. 1. Finite automaton for an extension of the algorithm, rb is a random bit. 

The algorithm randomly introduces alternative re-codings to the representa- 
tion oik. It can be viewed as pre-processing bits of k from right to left into a new 
digit set { — 1, 0, -1-1, -1-2}. Then the resulting scheme for point multiplication can 
be performed in either direction. The conversion uses a carry bit set initially to 
0. When this bit is summed with the current bit of k, the result 0, 1 or 2 can be 
re-coded in different ways: 0 always gives a new digit 0 with carry 0; 1 can give 
either new digit 1 and carry 0, or new digit 1 with carry 1; and 2 gives either 
new digit 0 and carry 1, or new digit 2 and carry 0. Fig. 1 illustrates this as a 
finite automaton for a slight extension of the original right-to-left algorithm. It 
has 4 states, numbered 0 to 3 with the carry being 1 if, and only if, the state is 
2 . 

For the transition from state 2 to state 1, the normal order of doubling and 
adding is reversed. This achieves the processing for digit value 2. The extension 
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here allows a new transition from state 0 to state 2; the original algorithm is the 
special case in which the random bit rb = 1 always for state 0. The extension 
also allows the random bits to be biased for each state. However, if the same 
distribution of random bits is used for each of the states 0, 1 and 3, the automaton 
simplifies to just two states, obtained by merging states 0, 1 and 3. 

Figure 2 provides equivalent code for the associated right-to-left point mul- 
tiplication. A left-to-right version is also possible, and can be attacked in the 
same way. 



Q •«— O ; /* O is the zero of the elliptic curve */ 

State ■(— 0 ; 

While k > 0 do 

{ 

If (k mod 2) = 0 then 
case State of 
{ 

0,1,3 I Q — 2Q j State — 0 , 

2 ^ P — P+Q j Q — 2Q j State — 3 j 

} 

else 

case State of 

{ 

0,1,3 : If rb = 0 then /* rb is a Random Bit */ 

{ P — P“Q j Q — 2Q j State i — 2 } 
else 

{ P — P+Q j Q i — 2Q j State — 1 } j 

2 : If rb = 0 then /* rb is a Random Bit */ 

{ Q — 2Q I P — P+Q J State — 1 } 
else 

{ Q ^ 2Q } ; 

} ; 

k ^ k div 2 ; 

} ; 

If State = 2 then P P+Q ; 

Fig. 2. Oswald & Aigner’s randomized signed binary exponentiation (extended). 



3 Efficiency Considerations 

Definition 1. Let a, (3, 7 and S be the probabilities that the random hit rb is 
chosen to be 1 when the current state is 0, 1, 2 or 3 respectively. 

These probabilities can be chosen to improve efficiency or, as we shall see, se- 
curity. For a key k whose bits are selected independently and at random from 
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a uniform distribution, the matrix of transition probabilities between states of 
the automaton is then 

1 1 0 i 

2 2^2 

a 0 1—7 S 

2 2 2 2 

1 — g 1—0 7 1 — (5 

2 2 2 2 

0 0^0 

Lemma 1 . The transition matrix has an eigen-vector (|— /i, \— 2 g,, 2 fj,, /i) where 
/i = i2-2alti30i-y+25 ' elements are the prohahilities associated with each state. 
Moreover, 0 < ^ < 

This is an easy exercise for the reader. Taking the dot product of this eigen- 
vector with the vector ( 5 , 5 , 1 — 57, 5 ) of average additions associated with each 
state provides the expected number of additions per bit: 7)/r. The number 

of doublings is constant at one per bit. So, to minimise the total time we require 
( 1 — 7)/x = 0, i.e. ( 1 — 7)(2— a— /?) = 0, i.e. disallow either the transition from state 
2 back to state 1, or both transitions to state 2 from states 0 and 1. Avoiding these 
extremes provides greater randomness. In particular, a and/or /3 should be kept 
away from 1 so that states 2 and 3 are reachable. In the limit as a/dyJ— >-1 (which 
optimises efficiency), on average there is half an addition per bit of k. Thus, a 
typical addition chain has a little over ^ log 2 k additions (or subtractions) . Even 
a modest bias towards efficiency, such as taking a = (3 = ^ = S> |, changes 
this by just 2 % or less. 

4 The Attack 

4.1 Initial Hypotheses, Notation, and Overview of the Attack 

The attack here assumes sufficiently good monitoring equipment and a suffi- 
ciently weak implementation. Specifically it is assumed that: 

— Adds and doublings can always be identified and distinguished correctly in 
side channel leakage from a single point multiplication; and 

— Side channel traces are available for a number of different uses of the same, 
unblinded key value. 

For ease in calculating probabilities, we assume adds and doublings can al- 
ways be distinguished. Similar results hold if this is only usually the case. 

By the first hypothesis, 

— every side-channel trace tr can be viewed as a word over the alphabet {A, D} 

where A denotes the occurrence of an addition and D that of a doubling. Here, 
as expected, the trace is written with time increasing from left to right. However, 
this is the opposite of the binary representation of the secret key k which is 
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processed from right to left, so that the re-coding can be done on the fly (Fig. 2). 
For example, if the machine were to cycle round only states 0 and 1 giving the 
sequence of operations for square-and-multiply exponentiation, then the trace 
would be essentially the same as the binary, but reversed: every occurrence of 
0 would appear as D, and every occurrence of 1 would appear as AD. So the 
binary representation 11001 would generate the trace ADDDADAD. There is 
one D for every bit, and we index them to correspond: 

Definition 2. The position of an instance of D in a trace is the number of 
occurrences of D to its left. 

Thus, the leftmost D of ADDDADAD has position 0 and arises from pro- 
cessing the rightmost bit of 11001, which has index 0. 

The attack consists of a systematic treatment of observations like the follow- 
ing. The only transition which places D before rather than after an associated 
occurrence of A is the transition (21). Hence, every occurrence of the substring 
DAAD in a trace tr corresponds to traversing transition (21) then (12) or (11) in 
the finite automaton. This must correspond to processing a bit 1 to reach state 
2, and then two further 1 bits. The trace can be split between the two adjacent 
As into a prefix and a suffix. There is a corresponding split in the binary repre- 
sentation of the secret key k such that the suffix of k has a number of bits equal 
to the number of Ds in the prefix of tr. This enables the position of the substring 
111 to be determined in k. Moreover, by the next lemma, most occurrences of 
111 can be located in this way if enough traces are available: DAAD appears 
exactly when the middle 1 is represented by the transition (21). 

Lemma 2. // 11 occurs in the binary representation of k then the probability 
of the left-hand 1 being represented by transition (21) in a trace for k is it = 
4/r(l-7). 

Proof. 4/x is the probability of being in state 2 as a result of the right-hand 1 
and 1— 7 is the (independent) probability of selecting transition (21) next. □ 

4.2 Properties of the Traces 

Figure 3 lists the transitions and operation sequences which can occur for each 
bit pair, including the probability of each. It assumes that initial states have the 
probabilities determined by Lemma 1, and that neighbouring bits are unknown. 
The figure enables one to see which bit pairs can arise from given patterns in a 
trace, and to calculate their probabilities: 

Lemma 3. Let ki denote the bit of k with index i, and p, be as in Lemma 1. 
Then, 

i) For a given trace, if the Ds in positions i and i-\-l are not separated by any 
As, then the bit pair ki+\ki is 00 with probability (2— 2^(1— 7))“^, which is at 
least Lf the Ds are separated by one or more As in any trace, then the bit pair 
is certainly not 00. 
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ii) For a given trace, if the Ds in positions i and i+\ are separated by one A, 
then the hit pair ki+iki is 10 with probability If the Ds are separated by no 
As or two As in any trace, then the bit pair is certainly not 10. 

iii) For a given trace, if the Ds in positions i and i+1 are separated by two As, 
then the bit pair ki+iki is certainly 11. The probability of two As when the bit 
pair is 11 is 2 /i(l— 7 ), assuming bit ki-i is unknown. 

iv) For a set of n traces, suppose the Ds in positions i and i+1 are separated by 
no As in some cases, by one A in some cases, and by two As in no cases. Then 
the bit pair ki+\ki is 01 with probability ( 1 +( 1 — 2 /i(l— 7 ))")“^. 



Bit 

Pair 


Operation 

Patterns 


State 

Sequences 


Probabilities, 
given the bit pair 


00 


D.D 


000, 100, 300 


l-2p 




AD.D 


230 


2y 


10 


D.AD 


001, 002 


2 h 




D.AD 


101, 102 


\-2p 




AD.AD 


231, 232 


2fi 




D.AD 


301, 302 


h 


01 


AD.D, AD.AD 


010, 023 


(2 h)^-) (2 h)(f 




AD.D, AD.AD 


no, 123 


{j-2p)l3, (i-2p)(l-/3) 




DA.D, D.AD 


210, 223 


2p(l-7), 2p7 




AD.D, AD.AD 


310, 323 


p5, fJ.{l-5) 


11 


AD.AD, AD.AD 


on, 012 


{\-y)al3, (i-/r)a(l-d) 




AD.DA, AD.D 


021, 022 


(|-m)( 1-«)(1-7), (|-/i)(l-a)7 




AD.AD, AD.AD 


111, 112 


{\-2p)f3\ (i-2p)/3(l-d) 




AD.DA, AD.D 


121, 122 


(i-2M)(l-d)(l-7), (|-2/r)(l-/3)7 




DA.AD, DA.AD 


211, 212 


2m( 1-7)/3, 2p(l-7)(l-/3) 




D.DA, D.D 


221, 222 


2^7(1-7), 2^i7^ 




AD.AD, AD.AD 


311, 312 


gbp, g5{l—l3) 




AD.DA, AD.D 


321, 322 


g{l-S){l—y), g{l-5)j 



Fig. 3. All possible operation sequences for all bit pairs, and their probabilities given 
the bit pair occurs. {Bit pairs are processed right to left and operations left to right.) 



Proof, i) First, by inspection of the finite automaton, the only possible opera- 
tion sequences for 00 are ADD and DD. So the Ds are always adjacent. The 
intervention of an A will prove that the bit pair is not 00. 

Suppose there is no intervening A between the two specified Ds. Using Figure 
3, if the bit pair is 00 then the probability of this is ttoo = 1; if the bit pair is 
10 then the probability is ttiq = 0 ; if the bit pair is 01 then the probability is 
7’"oi = {\—pL)a+{\—2pL)(3+pt5] and if the bit pair is 11 then the probability is 
TTii = {i^—pL){l—a)+{\—2p){l—(3)+ 2 /i 7 +/x(l— 5). Thus, the correct deduction 
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of 00 is made with probability 

'^Oo/(7Too+7’'lO+7’'01+’’'ll) = 1 /( 2 — 2 /i(l — 7 )) > 

ii) Similarly, from Figure 3 the bit pair 10 must always include the operation 
A once between the two occurrences of D, but this is not the case for any 
other bit pair. Thus the absence of an A, or the presence of two As, guarantees 
the bit pair is not 10. However, suppose there is exactly one A between the 
specified Ds. By Figure 3, if the bit pair is 00 then the probability of this is 
’’’oo = 1— ’’’00 = Oj if the bit pair is 10 then the probability is 7r(g = 1 — ttiq = 1; 
if the bit pair is 01 then the probability is = 1 — ttoi; and if the bit pair is 11 
then the probability is = 1 — tth— 2^(1— 7). Thus, the correct deduction of 
10 is made with probability 

Ko/(Ko+Ko+Ki+'^ii) = 5 - 

iii) This part is immediate from Figure 3. 

iv) Finally, by parts (i) and (ii), a bit pair which includes both the possibilities 

of no As and of one A between the specified Ds cannot be 00 or 10; it must be 
01 or 11. The probability of not having two As in any trace when the digit pair 
is 01 is 1, of course. By Fig. 3 the probability of not having two As in any of the 
n traces when the digit pair is 11 is 7 t„ = (1— 2/x(l— 7))”. Hence the probability 
of the pair being 01 rather than 11 is l/(l-|-7r„). □ 

We must be a little careful in the application of this lemma. Firstly, each part 
assumes no knowledge of bit ki-i. Knowing it changes the probabilities. In most 
cases, the differences are small enough to be considered negligible; for accurate 
figures the table can be used to select just the cases starting in states 0 or 3 when 
the preceding processed bit is 0, and the cases starting in states 1 or 2 when that 
bit is 1. The only case where a qualitative difference occurs is for 11 when AA 
only occurs if fcj_i = 1. In the case of fci_i = 0 this means we cannot distinguish 
01 from 11 so easily. This is a typical problem to solve when reconstructing the 
whole key. 

Secondly, deductions from different traces are not independent. For example, 
suppose all of n traces have one A between the Ds in positions i and i+\. From 
(ii) of the lemma it is tempting to deduce that the bit pair is 10 with probability 
1 — (i)". However, the probability of this may still only be In particular, this 
happens when the parameters a = (3 = 5 = Q are selected. Then the bit pairs 10 
and 01 would always have exactly one A between the Ds, and bit pairs 00 and 11 
would never have any As. So 01 and 10 would be equally likely with probability 
i if exactly one A always occurred. The independent decisions which can be 
combined are those based on the independent choices of random bits, as in (iv) . 

4.3 Reconstructing the Key 

For this section we assume the default values which give the original algorithm, 
namely a = 1 and (3 = ^ = 6= This means fj, = ^. Later we consider 
alternatives which might improve security. Then Figure 3 immediately yields: 
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Lemma 4. For the above default values of the parameters, 

i) the bit pair 01 has no intervening A between the assoeiated Ds of a trace with 
probability ^ and one intervening A with probability 

ii) the bit pair 11 has no intervening A between the associated Ds with probability 
one intervening A with probability and two As with probability 

The choices which lead to the probabilities in the previous lemma are made 
independently for each trace. Hence, for n traces and a pair 01, there are no Hs 
in every trace with probability (^)" and one A in every trace with probability 
(^)”. A similar result holds for the pair 11. By averaging: 

Lemma 5. For the default values of the parameters and n traces, in every trace 
a bit pair of the form *1 has: 

i) no As between the associated Ds with probability {(;^)” + (|)"}/2 ; and 

ii) one A with probability {(^)" + (^)"}/2. 

To reconstruct the key k, first classify every bit pair as 00 if there are no 
intervening As in any trace, 10 if there is always one intervening A, 11 if there is 
an intervening AA, and, otherwise, *1 if there is a variable number of intervening 
As. This correctly classifies all pairs 00 and 10, and pairs classed as 11 or *1 are 
certainly all 11 or of the form *1 respectively. For n = 10 both probabilities in 
the lemma are bounded above by ~ 1/166. Thus about 1 in 83 bits pairs 

01 and 11 will be incorrectly classified as 00 or 10. Also, by the next lemma, 
1— > I of pairs 11 will be located correctly by occurrences of AA when 
they are the left pair in triplets 111. The proof of it goes back to Lemma 2. 

Lemma 6. For the default values of the parameters and n traces, the bit pair 
11 has at least one trace exhibiting AA with probability 1— (|)" if it has a 1 to 
the right and with probability 0 if it has a 0 to the right. 

This is now enough information to deduce almost all the bits of a standard 
length ECC key. Every bit which is deduced as the right member of a pair *1 is 
correctly classified as 1 since the mixture of patterns used in the classification 
is not possible for pairs of the form *0. However, about 1 in 83+1 of the bits 
which are deduced to be right members of a pair *0 is incorrectly classified as 0 
because not all the possible patterns for the bit pair have occurred. In an ECC 
key of, say, 192 bits, about two bits will then be incorrect. 

Each bit b belongs to two pairs: *b and b*, say. Traces for the pair *b have 
been used to classify b. In half of all cases, there is a 0 bit to the right and 
the characteristic patterns of traces for the pair 60 can be used to cross-check 
the classification. In the other half of cases the patterns for 61 also indicate the 
correct value for 6 as a result of the ratios between the numbers of occurrences 
of each pattern. However, the patterns observed for overlapping bit pairs are not 
independent. Although unlikely, one set of patterns may reinforce rather than 
contradict a wrong deduction from the other set. There is no space for further 
detail, but the following is now clear: 
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Theorem 1. Suppose elliptic curve adds and doubles can he distinguished accu- 
rately on a side channel. If the original Oswald-Aigner exponentiation algorithm 
is used with the same unhlinded 192-hit ECC key k for 10 point multiplications 
then approximately half the bits can he deduced unambiguously to be 1, and the 
remaining bits deduced to be 0 with an average of at most about two errors. 

This theorem says that a typical ECC secret key can usually be recovered on 
a first attempt using a dozen traces with very little computational effort beyond 
extracting the add and double patterns from each trace. By checking consistency 
between deductions of overlapping bit pairs, most errors should be eliminated. 
However, it is computationally feasible to test all variants of the deduced key for 
up to two or three errors. The correct one from this set can surely be established 
by successfully decrypting some ciphertext. 

4.4 Secure Parameter Choices? 

From the last section, it is clear that greater security could only arise from 
making it less easy to distinguish between pairs of the form *0 and those of the 
form *1. This requires choosing parameters for which 01 and 11 are less likely 
to exhibit both no As and one A between the relevant Ds. From Fig. 3, the 
probability of no As for 01 and the probability of one A for 11 are the same, viz. 

7T = -I- {\-2p)l3 + p,5. 

So this must be made close to 0 or close to 1. 

For example, choosing a = (3 = 1 makes /r = 0 and so tt = 1, whereas 
choosing a = f) = 6 = 0 makes tt = 0. Thus both limits are possible. In general, 
for 7T = 1 (the first case) the traces match the pattern of operations for normal 
square-and-multiply, so we expect each A to correspond to the multiply of a 1 
bit. 

Although 00 and 01 are indistinguishable from the patterns, and 10 and 11 are 
indistinguishable (unless perhaps A A could occur), the attacker now recognises 
that patterns for the pairs 0* have no intervening A and patterns for the pairs 1* 
have one intervening A. This gives him each bit unequivocally. At the opposite 
extreme, if tt = 0 (the second case) then 10 and 01 become indistinguishable from 
the patterns as do 00 and 11 (again, unless perhaps AA could occur). Now the 
attacker recognises pairs with equal bits from pairs with different bits. Knowing 
the first bit is 1, he can deduce all the bits one by one from left to right, and 
hence the key k. 

In general the attacker can exploit the complementary frequencies of one A 
for the pairs 01 and 11. Either they are close enough to ensure n traces usually 
display both patterns (as in the previous section) or they are distinct enough for 
the patterns to be strongly biased in opposite directions in the trace set (as in 
the previous paragraph) . He can then recognise either the equality of the second 
bits or the difference in the first bit respectively, and use the fact that each bit 
belongs to two pairs to cross-check the deduction of many bits. Consequently, 
there are no secure choices of the parameters under repeated use of the unblinded 
key k. 
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Identical working to the previous section shows that similar computations 
can be performed for keys of any length. With the choice of parameters there, 
the number of traces needed to achieve a specified degree of confidence in the 
determined bits is n = 0(log log k) because we want at most one error in ( ^)" = 
O(logfc) bits. The same calculations apply for any tt which is not 0 or 1, giving 
the same size order for n. For the working above in this section, mistakes are 
only made when too many traces record the opposite pattern to that expected 
from the value of tt. Then, for tt close enough to 0 or 1, the same bound on the 
size of n can be obtained for limiting the errors. 

Theorem 2. No choice of algorithm parameters is secure for a reasonable key 
length under the above attack if 0{(logk)^) decipherings are computationally 
feasible and O(loglogfc) traces are available from point multiplications using the 
same unblinded key. 

When adds and doubles are not distinguished with 100% certainty, the pro- 
portions of numbers of Ts can be used to assign a likelihood to the correctness 
of the selected bit pair. Those which are most likely to be wrong can be modified 
first, thereby decreasing the search time to determine the correct key. 



4.5 Counter-Measures 

In the absence of a secure set of parameter choices, further counter-measures are 
required. The most obvious counter-measure is to restore key blinding. A small 
number of blinding bits might still result in the attacker’s desired 10 or so traces 
for the same key eventually becoming available. These might be identified easily 
within a much larger set of traces by the large number of character subsequences 
shared between their traces. So the size of the random number used in blinding 
cannot reasonably be less than the maximum lifespan of the key in terms of the 
number of point multiplications for which it is used. Thus 16 or more bits are 
needed, adding around 10% to the cost of point multiplication. 

Identical formulae for additions and doublings are increasingly efficient and 
applicable to wider classes of elliptic curves, those of Brier and Joye [1] in par- 
ticular. These should make it more difficult to distinguish adds from doubles. 
Another favoured counter-measure is the add-and-always-double approach. Then 
the pattern of adds and doubles is not key dependent. Each occurrence of DD 
has an add inserted to yield the pattern DAD, but the add output is discarded 
without having been used. This can also be done for the Oswald-Aigner algo- 
rithm provided, in addition, an extra double is performed to convert each DAAD 
into DAD AD. The output of this double is likewise ignored. 

Alternatives algorithms exist. That described by Joye and Yen [5] is another 
add-and-always-double algorithm. There are also several randomized methods [3, 
15] which seem to be more robust because they do not satisfy the two properties 
identified in the introduction as those to which the above attack can be applied. 
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5 One Trace 

It is interesting to speculate on how much data leaks from a single point mul- 
tiplication since the above counter-measures should prevent re-use of identical 
values for the same key. Oswald [10] noted that for some deterministic re-coding 
algorithms in which several non-zero digits generate indistinguishable As, the 
operation patterns resulting from numbers of up to 12 bits could only represent 
at most 3 keys. By breaking a standard ECC key into 12-bit sections, this means 
very few keys actually generate an observed patterns of operations. Moreover, 
these can be ordered according to their likelihood of occurrence, and this con- 
siderably reduces the average search time for the correct key. Hence the key can 
be recovered quite easily. 

Is the same possible here? In [10] she also writes that the same attack is 
possible on randomized algorithms with weaker results, but provides no detail. 
Randomized algorithms have much weaker inter-dependencies between adjacent 
operation patterns. This should substantially increase the number of keys which 
match a specific pattern of point operations. The key Lemma 3 above does not 
provide certainty for many bits unless a number of traces are available; only 
the infrequent instances of AA seem to allow definite determination of any bits 
from one trace. Of course, an analysis of sub-sequences of more than two bits is 
possible, as in [14], but, besides better probabilities, this gives no further insight 
into whether it is computationally feasible to recover the key from a single trace. 

Instead, software was written to enumerate all the keys which could represent 
a given string. On average, for the extended version of the algorithm, the trend 
up to 16-bit keys indicates clearly that a little over O(v^) keys will match a 
given pattern - under 20 match a given 16-bit pattern. This would appear to 
ensure the strength of the algorithm when a key is used just once but only if the 
key has at least 2® bits or there is considerable ambiguity in the side channel 
about whether the operations are adds or doubles. The original algorithm has 
fewer random choices, and so has even fewer keys matching a given pattern. 
Thus, a standard ECC key could be recovered from a single trace in feasible 
time if adds and doubles are clearly distinguishable. 

6 Conclusion 

One of several, similar, randomized exponentiation algorithms has been investi- 
gated to assess its strength against a side channel attack which can differentiate 
between elliptic curve point additions and point doublings. Straightforward the- 
ory shows that at most 0(10) uses of the same unblinded key will enable a secret 
key of standard length to be recovered easily in a computationally feasible time. 
No choice of parameters improves security enough to alter this conclusion. Us- 
ing longer bit sequences than the theory, it is also clear that software can search 
successfully for keys when just one side channel trace is available. However, this 
number may need increasing if adds and doubles might be confused or standards 
for key lengths are increased. 
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The main property which is common to algorithms which can be attacked 
in this way seems to be that the next subsequence of operations at a given 
point in the processing of the key must be chosen from a small, bounded set of 
possibilities which is derived from the key and the position, but is independent 
of previous choices. Hence, our overall conclusion is that such algorithms should 
be avoided for repeated use of the same unblinded key if adds and doubles can 
be differentiated with any degree of certainty. Furthermore, for typical ECC key 
lengths, a single use may be sufficient to disclose the key when adds and doubles 
are accurately distinguishable. 
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Abstract. Many hardware countermeasures against differential power 
analysis (DPA) attacks have been developed during the last years. De- 
signers of cryptographic devices using such countermeasures to protect 
their devices have the challenging task to select and implement a suit- 
able combination of countermeasures. Every device has different require- 
ments, and so there is no universal solution to protect devices against 
DPA attacks. 

In this article, a statistical approach is pursued to determine the effect 
of hardware countermeasures on the number of samples needed in DPA 
attacks. This approach results in a calculation method that enables 
designers to assess the resistance of their devices against DPA attacks 
throughout the design process. This way, different combinations of 
countermeasures can be easily compared and costly design iterations 
can be avoided. 

Keywords: Smart cards, Side-Channel Attacks, Differential Power anal- 
ysis (DPA), Hardware countermeasures 



1 Introduction 

During the last years, a lot of effort has been dedicated towards the research 
of side-channel attacks [1,9,10] and the development of corresponding counter- 
measures. In particular, there have been many endeavors to develop effective 
countermeasures against differential power analysis (DPA) [10,15] attacks. 

DPA attacks are based on the fact that the power consumption of a cryp- 
tographic device depends on the internally used secret key. Since this property 
can be exploited with relatively cheap equipment, DPA attacks pose a serious 
practical threat to cryptographic devices, like smart cards. 

The countermeasures that have been developed up till now against these 
attacks can be categorized into two groups. The first group are the so-called 
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algorithmic countermeasures [4,5,7,14,22]. The basic idea of these countermea- 
sures is to randomize the intermediate results that are processed during the 
execution of a cryptographic algorithm. Classical first-order DPA attacks are 
rendered practically impossible, if this randomization is implemented correctly. 
However, there are two significant drawbacks of this approach. 

The first one is that the randomization is quite expensive to implement for 
non-linear operations as they are used in symmetric ciphers (see for example [5], 
[6] and [7]). The second one is that many algorithmic countermeasures do not 
provide sufficient protection against higher-order DPA attacks [13] or sophisti- 
cated SPA attacks [11,18]. The consequence of these facts is that algorithmic 
countermeasures are typically combined with hardware countermeasures [2,8,12, 
16,19,20,21]. 

The hardware approach to counteract DPA attacks differs significantly from 
the algorithmic one. The intermediate results that occur during the execution 
of a cryptographic algorithm are not affected by this type of countermeasure. 
Instead, the goal of this approach is to bury the attackable part of the power 
consumption in different kinds of noise. 

The more noise there is in the power traces recorded by the attacker, the more 
measurements are needed for a successful DPA attack. Although the basic idea 
is relatively simple, hardware countermeasures have proven to be quite effective 
in practice. This is why cryptographic devices are typically either protected by a 
combination of hardware and algorithmic countermeasures or solely by hardware 
countermeasures. 

The decision which combination of countermeasures is implemented in a de- 
vice, is made by the designers. It is their task to choose a combination of coun- 
termeasures that provides the resistance against DPA attacks that is necessary 
for the planned application of the device. The resistance against DPA attacks is 
typically specified by a number of samples: If DPA attacks with this number of 
samples fail, the device is resistant enough. Otherwise the requirements are not 
fulfilled. 

Choosing a suitable combination of countermeasures is a very challenging 
task in practice. This is due to the fact that this decision needs to be made 
at a very early stage of the design process. Design iterations are costly and 
so the fabrication of a physical prototype to test whether a combination of 
countermeasures is sufficient or not, should be avoided. 

In order to minimize the number of design iterations, methods are necessary 
to assess the effect of countermeasures on the number of samples. However, par- 
ticularly for hardware countermeasures there are no publications that discuss 
such methods. Publications of hardware countermeasures usually just contain 
case studies showing that the proposed countermeasure really increases the num- 
ber of samples. Yet, such case studies are only of limited use for a designer of a 
device who uses a different technology, a different architecture, and potentially 
even uses multiple countermeasures simultaneously. 

In this article, a statistical approach is pursued to determine the effect of 
hardware countermeasures on the number of samples. This approach leads to a 
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calculation method that allows the determination of lower bounds for the number 
of samples needed in DPA attacks. The presented calculation method is based 
on only very few parameters that can be assessed already at an early stage of 
the design process. 

It is therefore ideally suited to help designers to choose the right combination 
of countermeasures already at the beginning of the design process. Of course, 
the presented calculation method can also be used at any time during the design 
process to determine whether a design fulfills certain resistance requirements or 
not. The more precisely the parameters of the calculation can be determined, 
the more precise becomes the statement on the number of samples. 

This article is organized as follows: Section 2 provides a short summary of 
the fundamentals of DPA attacks and defines some of the notation that is used 
in this article. Section 3 analyzes the principles that are used by hardware coun- 
termeasures to increase the resistance against DPA attacks. The calculation of 
lower bounds for the number of samples is presented in section 4. In section 5, 
the corresponding formulas are empirically verified. Conclusions can be found in 
section 6. 

2 Differential Power Analysis 

The power consumption of a digital circuit depends on the data that the circuit 
processes. Thus, the power trace of a device executing a cryptographic algorithm, 
depends on intermediate results of this algorithm. 

DPA attacks exploit the fact that in all cryptographic algorithms there occur 
intermediate results which are a function of the ciphertext and only few key bits. 
We call these key bits a subkey. In a DPA attack, one subkey after the other 
is attacked until the entire secret key is known or the missing rest of the key 
can be efficiently determined by a brute-force search. An attacker knowing the 
cryptographic algorithm that is executed in a device, can reveal a subkey as 
follows: 

First, the power consumption of the device is recorded, while it encrypts 
S different plaintexts using the same key. In this article, we use the common 
assumption that these plaintexts are uniformly distributed. We refer to the power 
traces that are recorded during the encryptions as where T is the 

number of points that are recorded per encryption. 

In the next step, the attacker chooses an intermediate result of the executed 
algorithm that is a function of the ciphertext and a short subkey. Based on the 
ciphertext and all possible values for the subkey, hypothetical values for the in- 
termediate result are calculated. This leads to a matrix Ii...ks...s of hypothetical 
intermediate results, where K is the number of possible values for the subkey. 

The subkey kc that is actually used in the attacked device is one of the K 
possible values for the subkey. Hence, the values Ik^,i...s have actually been pro- 
cessed by the attacked device while it has been doing the S recorded encryptions. 
Consequently, the values Pi...s,ta depend on where tc is the moment of 

time at which the attacked intermediate results have been processed. 
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The attacker determines a hypothetical power consumption value H^ g for ev- 
ery Ik,s- The absolute values of Hi ^ i s are of no importance for the attack — 
only the relative distance between the values is relevant. 

Nevertheless, the calculation of requires some basic knowledge 

about how the processing of different data affects the power consumption of a 
device. Many devices use pre-charged buses. Such buses cause a power consump- 
tion that is proportional to the Hamming weight of the data block that is being 
transferred over the bus. 

After having determined the attacker reveals the correct subkey 

kc by correlating the hypothetical power consumptions with the one of the de- 
vice. In this article, the Pearson correlation coefficient is used to measure this 
correlation. 

In [10], Kocher et. al. measure this correlation by calculating the distance 
between means. In the context of DPA attacks, there is no significant difference 
between the two measures for the correlation. However, we favor the Pearson 
correlation coefficient because there exists a well-established theory on measuring 
correlations this way — the Pearson correlation coefficient is the common measure 
to determine the linear relationship between two variables. Equation 1 shows a 
definition of the correlation p between two variables X and Y , where E{X), E{Y) 
and E{XY) are expected values, Cov{X,Y) is the covariance and Var{X) as 
well as Var{Y) are the variances of the variables. 



E{XY) - E{X)E{Y) Cov{X, Y) 
i/Var{X)Var{Y) i/Var{X)Var{Y) 



The definition of the Pearson correlation coefficient r is shown in equation 2. 
r estimates the correlation p between two variables based on S samples, x and 
y in equation 2 denote the means of the variables based on S samples. 



r(< xi,...,xs>,<yi,...,ys >) 



ELi {xs - x){yg - y) 

VEf=i(W-s)VEf=i (ys-ar 



(2) 



In a DPA attack, the Pearson correlation coefficient between the values 
Hk=fixed,i...s and Pi...s,t=fixed Is Calculated for every fixed k and t. This leads to 
the matrix TZ = ri,,,K.i...T of correlation coefficients. Since the values Pi...syt^tc 
and are largely uncorrelated, the correlations /CVfc/fec,vt/tc are sig- 

nificantly lower than Pk^, ta- 
li S is sufficiently large in an attack, this difference between the correlations 
can be detected in the matrix TZ of Pearson correlation coefficients. In this case, 
one correlation coefficient of TZ is significantly larger than all other ones. The 
position of this peak in TZ reveals the correct subkey kc- 

The number of samples that is needed in a DPA attack to reveal kc is mainly 
determined by the value Pkc,tc- This observation has already been made previ- 
ously by Messerges et. al. in [15]. 




226 S. Mangard 



Since Pkc,tc is th® maximum value of we refer to this correlation 

as Pmax throughout the remainder of this article. The higher Pmax is, the less 
samples are needed to see a significant peak at the position (fcc,^c) of TZ. 

This is why it is the goal of hardware countermeasures to reduce Pmax to a 
value that is as close to zero as possible. 

3 Hardware Countermeasures 

In order to increase the number of samples needed in DPA attacks, hardware 
countermeasures decrease the correlation between the hypothetical power con- 
sumptions and the power consumption of the device. 

The hypothetical power consumptions are determined by the attacker, and 
therefore they cannot be controlled by the designers of a device. Yet, designers 
can alter the power consumption of their devices in such a way that Pmax is 
reduced. There exist two possibilities to lower this correlation. All hardware 
countermeasures that have been proposed so far, rely on these two possibilities. 



3.1 Reduction of the SNR 

The first possibility to reduce the correlation pmax is to bury the part of the 
power consumption that is caused by the processing of the attacked intermediate 
result in a lot of noise. 

The burying of this signal in noise is best measured by a signal-to-noise ratio 
(SNR). For the definition of this SNR, we define Q to be the power consumption 
caused by the attacked intermediate result and N to be additive noise. Con- 
sequently, the power consumption of a device at the time tc can be written as 
Ps,tc = Qs + Ns- 

Equation 3 shows the definition of the SNR for the signal Q. Since the DC 
components of N and Q are not relevant for the calculation of the correlation, 
only the AC components (i.e. the variances) of the signals are considered in this 
equation. 



SNR = 



Var(Q) 

Var{N) 



(3) 



The lower the SNR is, the lower is also the correlation between the correct 
hypothetical power consumption and the power consumption of the device. 

There are several hardware countermeasures that reduce the SNR. The most 
prominent examples are special logic styles that minimize the data dependency 
of the power consumption. Such logic styles are presented by Moore et. al. in [16, 
17], by Tiri et. al. in [20,21] and by Saputra et. al. in [19]. 

However, there are many more ways to reduce the SNR. For example, also 
flattening the power consumption or random charging of on-chip capacitances 
reduce the SNR. In fact, any processing that occurs in parallel to the execution 
of the cryptographic algorithm, leads to this result. 
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In general, the effect of a hardware countermeasure on the SNR can be de- 
termined already at an early stage of the design process. Even before the imple- 
mentation of a device has started, it is possible to assess the SNR. During the 
implementation phase, the SNR can be determined by using tools that assess 
the power consumption of a device. Due to the fact that the overall power con- 
sumption of integrated circuits has become increasingly important during the 
last few years, several tools of this kind are available. 

3.2 Random Disarrangement of tc 

The second possibility to reduce the correlation pmax is to randomly disarrange 
the moment of time at which the attacked intermediate result is processed. If 
the time tc is different in every power trace, the correlation between the cor- 
rect hypothetical power consumption and the one of the device is significantly 
reduced. 

Random disarrangement techniques lead to the fact that there is a certain 
probability distribution for tc- Clearly, the highest correlation in DPA attacks 
occurs at the moment of time of the power traces, where the maximum of this 
probability distribution is located. We refer to this moment of time as tc- The 
maximum probability p that is located at tc is the decisive value determining 
how much the correlation is reduced in DPA attacks. The lower p is, the more 
samples are required in DPA attacks. 

There exist many proposals for hardware countermeasures that are based on 
a random disarrangement of tc- The classic countermeasure that is based on this 
principle is the insertion of random delays [3] , which can even be implemented in 
software. Another approach that is also based randomizing tc is pursued by Irwin 
et. al. in [8] and by May et. al. in [12]. They propose to use a non-deterministic 
processor to foil DPA attacks. 

The countermeasure proposed by Benin! et. al. in [2], also gains most of its 
strength by randomizing tc- Of course, also asynchronous logic styles [16,17] are 
very well suited for the insertion of non-deterministic delays. 

In order to determine the effect of a random disarrangement of tc on DPA 
attacks early in the design process, it is necessary that p can be determined very 
early. 

In case of the insertion of random delays, tc is binomially distributed and 
so p can be calculated in a straightforward manner. In case of the other coun- 
termeasures, the distributions of tc may be more complex. Yet, even if a direct 
calculation of p is not practical, it is always possible to approximate it empirically 
based on a software model of the countermeasure. 

In this section, we have introduced the possibilities that can be used to lower 
the correlation p-max in DPA attacks. There are two properties of (combinations 
of) hardware countermeasures that largely determine the effect of the counter- 
measures on the number of samples: the SNR defined in equation 3 and p- 

Both properties can be assessed already at an early stage of the design pro- 
cess. The following section introduces the calculation of lower bounds for the 
number of samples based on these two parameters. 
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4 Calculation of Lower Bounds for the Number of 
Samples 

The effect of hardware countermeasures on the number of samples is largely de- 
termined by the two parameters discussed in section 3. However, there are also 
some other parameters that have a certain influence of the number of samples. 
Throughout this article, these parameters are set to worst-case values from a 
designer’s point of view (i.e. all unknown parameters are set in favor of a poten- 
tial attacker). Hence, the calculation method introduced in this section, leads 
to lower bounds for the number of samples. This conservative measure is ex- 
actly what designers should use to determine the effectiveness of the hardware 
countermeasures in their design. 

In the following subsection, first formulas are derived to calculate Pmax in 
the presence of hardware countermeasures. Subsection 4.2 then introduces the 
calculation of lower bounds for the number of samples based on pmax- 

4.1 Pmax in the Presence of Hardware Countermeasures 

The Effect oi SNR on Pmax- In a DPA attack on a device without random 
disarrangement of tc, Pmax is the correlation between the hypothetical power 
consumption for the correct subkey and the one of the device at the time tc- 

Equation 4 shows the calculation of Pmax based on SNR. In this equation, the 
variable H refers to the hypothetical power consumption for the correct subkey. 
Q and N are used as defined in section 3: Q denotes the power consumption of the 
device caused by the attacked intermediate result and N denotes uncorrelated 
additive noise. 

,(H.o + «) = SE±£M£1S£±£> 

^JVar{H){Var{Q) + V ar{N)) 

E{HQ + HN) - E{H){E{Q) + E{N)) p{H, Q) 

VVar{H)Var{Q)^l+^ ^ 1 + ^ 

The Effect of p on pmax : If tc is randomly disarranged, the correlation pmax 
occurs between the correct hypothetical power consumption and the one of the 
device at the time tc- 

In equation 5, the variable P refers to the power consumption of the device 
at this time tc- The probability that a power consumption at this time is caused 
by the processing of an attacked intermediate result is p. With a probability of 
(1 — p), the power consumption at the time tc is caused by the processing of 
some other data. 

In equation 5, we refer to the power consumption caused by an attacked 
intermediate result as P. With O we refer to the one caused by the processing of 
other data. In practice, O is largely independent from the correct hypothetical 
power consumption El . This is why we set Cov{H, O) to zero in equation 5. 
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p * Cov{H, P) + (1 — p) Cov{H, O) 

\J V ar{H)V ar{P) 

p.CovjH.P) 

^Jvar{H)Var{P) V Var{P) 



( 5 ) 



Calculation of Pmax- The equations 4 and 5 can be combined into one formula 
(see equation 6) that allows to determine the effect of a given combination of 
hardware countermeasures on Pmax- 



Pmax — 



p(H.Q) PPm 

Vrar(P) 



( 6 ) 



Besides the parameters SNR and p, also the correlation p{H, Q) and the term 
F = inffuence Pmax- While the correlation p{H,Q) solely depends on 

how well the attacker knows the power consumption characteristics of a device, 
the factor F is a device-specific property. 

However, unlike SNR and p, F is rather difficult to assess at very early 
stages of the design process. In order to reasonably assess F, designers need some 
knowledge about how the power consumption of the device looks like before and 
after the attacked intermediate result is processed. The range that needs to be 
known is the bigger, the wider the probability distribution of tc is. 

In practice, F should be set to the worst-case value 1 at the very early stages 
of the design process. As soon as first assessments on the power consumption of 
the device are available, F can be updated accordingly in the calculation of the 
number of samples. 

Since designers should always determine the number of samples in a conser- 
vative manner, p{p[,Q) should be set to 1 throughout the design process. 

Based on equation 6, Pmax can be determined at any point of the design 
process. The better the parameters of this equation can be assessed, the better 
becomes the statement on the number of needed samples. 



4.2 Mapping pmax to a Number of Samples 

The number of samples needed in a DPA attack is the commonly used measure 
for the resistance of a device against these attacks. In order to reveal the correct 
subkey kc, the number of samples needs to be increased in an attack until a 
significant peak is visible in the matrix TZ. 

The Pearson correlation coefficients in this matrix TZ estimate the corre- 
lations P1...K.1...T based on S samples. The sampling distribution of a Pearson 
correlation coefficient r is best described by transforming r to a variable z that is 
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normally distributed. This transformation (known as Fisher’s Z-Transformation) 
is shown in the equations 7 to 9. 



1 , 1 + r 

z = - in 

2 1-r 

1 1 + P 

P = o 1 

2 1 — p 



^2 = 



1 



5'-3 



( 7 ) 

(8) 
(9) 



Based on these formulas, the sampling distribution of each correlation coef- 
ficient r of the matrix TZ can be determined easily based on Pi...k,i...t and S. 
The equations 7 to 9 are an approximation for S > 30. Yet, since the number of 
samples is typically much higher, this approximation is sufficient. 

Calculating the exact number of samples that are needed for a DPA attack 
is quite difficult in practice. This has several reasons. First of all, the designers 
of a cryptographic device don’t know to which sampling rate an attacker will set 
the oscilloscope in an attack, and the designers also don’t know how long the 
recorded power traces will be. Clearly, these parameters strongly influence the 
size and the values of the matrix 

Even if we would assume the designers knew Pi...k,i...t, the designers would 
still not know the correlation between the values of the matrix Pi...k.i...t- Yet, 
these values are correlated significantly in practice. 

In order to calculate a lower bound for the number of samples only based on 
Pmaxi we use the following observation: The number of samples that is needed 
to see a peak in practice, is mainly determined by the distance between the 
sampling distributions with p = Q and p = Pmax- All values of TZ are drawn 
from one of these two sampling distributions. Clearly, the more overlap there is 
between these distributions, the less likely it is to see a significant peak in TZ. 
An attacker can decrease this overlap by increasing the number of samples (see 
equation 9). 

In order to measure the distance between the distributions, we calculate the 
probability that a value drawn from the distribution with p = pmax is bigger 
than one that is drawn from the distribution with p = 0. This probability a 
can be calculated as shown in equation 10. This equation can be transformed to 
equation 11, which allows a direct calculation of the number of samples based 

on Pmax- 



a = <1> 



1 1 1+Pmax _ 1 1„ 1-1-0 

2 1-pmax 2 1-0 



2 

S-3 



(10) 




S = 3 + 8 



2 



( 11 ) 
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The quantile determines the distance between the distributions with p = 0 
and p = Pmax- The higher the probability a is, the bigger is the distance between 
the distributions and, consequently, the more likely it is to see a peak. 

In practice, several values are drawn from each of these distributions. Yet, 
these values are not drawn independently. Therefore, it is hard to calculate 
the exact probability for a peak, and we have to rely on empirical results to 
approximate a lower bound for the number of needed samples. 

Based on several practical attacks and simulations, we have determined that 
a = 0.9 is a reasonable value to calculate a lower bound for the number of 
samples needed in a DPA attack. Setting a = 0.9999 in equation 11 on the other 
hand, leads to a number of samples that reveals the attacked subkey with very 
high probability. Between a = 0.9 and a = 0.9999 there is a “gray area”. The 
lower the value of a is, the lower is the probability of observing a significant 
peak in the correlation trace rk^,i...T- The levels a = 0.9 and a = 0.9999 have 
been chosen in a very conservative way. 

In order to get more exact bounds for a particular device, the levels a may be 
refined as soon as simulated or measured power traces of the device are available. 

Based on the formulas we have provided in this subsection, designers of cryp- 
tographic devices can determine the effect of hardware countermeasures on the 
number of samples as follows: 

First, Pmax is calculated according to equation 6. The parameters needed 
for this calculation, are conservatively assessed by the designers as good as it is 
possible at the respective stage of the design process. Based on pmax, a lower 
bound for the number of needed samples can then be calculated according to 
equation 11. 



5 Empirical Verification 

In order to empirically verify the formulas derived in the last section, we imple- 
mented AES-128 on an 8-bit micro controller. The micro controller was clocked 
with llMHz and its power consumption was sampled with 250 MS/s during 4000 
AES-128 encryptions. 

We attacked an 8-bit intermediate result of AES-128 at the time it was 
transferred over the pre-char ged bus of the micro controller. In order to verify 
equation 6, a different number of bits of this intermediate result were attacked. 
From an attacker’s point of view the bits that are transferred over the bus, but 
are not part of the attacked intermediate result, are noise. Of course, there is 
also other noise in the measurement, besides the power consumption of these 
bits. 

However, since we are not familiar with the details of the design of the micro 
controller, we had to assume that this noise is zero for our first calculation of 
Pmax based on equation 12. In this equation, b is the number of bits that are 
attacked on the bus, and n is the variable representing the additional noise in 
the power traces. 
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Table 1. Comparison of calculated correlations with the empirically determined cor- 
relation coefficients for 4000 samples 



Number of Attacked Bits 


1 


2 


3 


4 


5 


6 


7 


8 


Calculated Pmax {n = 0) 


0.35 


0.50 


0.61 


0.71 


0.79 


0.87 


0.94 


1.00 


Calculated pmax {n — 2) 


0.32 


0.45 


0.55 


0.63 


0.71 


0.77 


0.84 


0.89 


Measured Vmax (4000 samples) 


0.31 


0.44 


0.53 


0.63 


0.70 


0.76 


0.82 


0.90 



Pmax — 




1 

^1^ n+(^ 



(12) 



The first line of table 1 shows Pmax for n = 0 and b = 1 ... 8. In the second 
line, the corresponding values are shown for n = 2. The correlation coefficients 
we determined empirically by performing a DPA attack with 4000 samples, can 
be found in line three. 

The values Pmax calculated based on n = 0, are higher than the ones deter- 
mined empirically. This is a logical consequence of the fact that no noise was 
assumed for the calculation. However, the values in the second and third line 
match almost exactly — obviously setting n = 2 models the noise of the micro 
controller very well. The slight deviations between the lines two and three are a 
consequence of the fact that not all wires of the bus of the micro controller have 
the same power consumption characteristics. 

Based on the micro controller, we also verified the effect of random delays on 
Pmax- For this purpose, we disarranged the 4000 power traces using a binomial 
distribution with P = | and n = 50 clock cycles for the delay — the maximum 
probability of this distribution is p = 2^(25)- However, when calculating p, 
the fact that the micro controller processes the attacked intermediate result 
twice needed to be considered. The micro controller we used, processed the 
attacked intermediate result in two subsequent clock cycles. Consequently, p 
was approximated by 2 * ^ (25) • We attacked a 4-bit intermediate result using 
the disarranged traces. Consequently, Pmax could be determined as shown in 
equation 13. 



Pmax 

The quotient ^ was determined empirically. Performing 1000 attacks with 
different random delays based on 4000 power traces of the micro controller, lead 
to mean of Vmax = 0.063. 

In the next step, we verified the calculation of the number of samples. We 
calculated S based on equation 11 for the attacks without random delays which 
we described before. The calculated number of samples for a = 0.9 and a = 
0.9999 are shown in table 2 {pmax was calculated using equation 12 with n = 2). 
The big difference between the number of samples for the same attack with 




= 0.06 



(13) 
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Table 2. The number of samples needed in a DPA attack on the micro controller for 
a = 0.9 and a = 0.9999 



Number of Attacked Bits 


1 


2 


3 


4 


5 


6 


7 


8 


S calculated with a = 0.9 


34 


17 


12 


9 


7 


6 


5 


5 


S calculated with a = 0.9999 


261 


122 


76 


53 


39 


29 


22 


16 



different values a, again shows that the levels cx = 0.9 and a = 0.9999 have been 
chosen very conservatively. 

We have performed DPA attacks with the calculated number of samples. 
Clearly visible peaks occurred in the attacks conducted with the numbers of 
samples calculated based on a = 0.9999. In attacks with the numbers of samples 
shown in the first line of table 2, only some sporadic peaks occurred in hundreds 
of attacks. 

Hence, we have been able to verify empirically all formulas presented in this 
article, based on attacks on an 8-bit micro controller. 



6 Conclusions 

Designers of cryptographic devices require methods to assess the effect of hard- 
ware countermeasures on the number of samples needed in DPA attacks. Such 
methods are necessary in order to avoid costly design iterations. 

In this article, we have identified those properties of hardware countermea- 
sures that affect the number of samples needed in DPA attacks. Based on these 
properties, we have derived formulas that allow the calculation of lower bounds 
for the number of samples needed in DPA attacks. 

The presented formulas enable designers to assess the resistance of their de- 
vices against DPA attacks from the earliest stages of the design process onwards 
until the fabrication. This way designers can verify that the combination of coun- 
termeasures they have chosen to implement in their devices, indeed provides the 
required protection against DPA attacks. 
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Abstract. Exponentiation is a central process in many public-key cryp- 
tosystems such as RSA and DH. This paper introduces the concept of 
self-randomized exponentiation as an efficient means for preventing DPA- 
type attacks. Self-randomized exponentiation features several interesting 
properties: 

— it is fully generic in the sense that it is not restricted to a particular 
exponentiation algorithm; 

— it is parameterizable: a parameter allows to choose the best trade-off 
between security and performance; 

— it can be combined with most other counter-measures; 

— it is space-efficient as only an additional long-integer register is re- 
quired; 

— it is flexible in the sense that it does not rely on certain group prop- 
erties; 

— it does not require the prior knowledge of the order of the group in 
which the exponentiation is performed. 

All these advantages make our method particularly well suited to 
secure implementations of the RSA cryptosystem in standard mode, on 
constrained devices like smart cards. 

Keywords: Exponentiation, implementation attacks, fault attacks, side- 
channel attacks (DPA, SPA), randomization, exponent masking, blind- 
ing, RSA, standard mode, smart cards. 



1 Introduction 

Since the invention of the public key cryptography by Diffie and Heilman [DH76], 
numerous public-key cryptosystems were proposed. Amongst those that resisted 
cryptanalysis, the RSA cryptosystem [RSA78] is undoubtedly the most widely 
used. Its intrinsic security relies on the difficulty of factoring large integers. In 
spite of decades of intensive research, the factoring problem is still considered as 
a very hard problem, making the RSA cryptosystem secure for sensitive appli- 
cations such as data encryption or digital signatures [PKC02] . 

Instead of trying to break the RSA at a mathematical level, cryptographers 
then turned their attention to concrete implementations of RSA cryptosystems. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 236-249, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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This gave rise to fault attacks [BDLOl] and side-channel attacks [Koc96,KJJ99]. 
Implementation attacks profoundly modified the way algorithms should be im- 
plemented. 

As a general rule of thumb for preventing implementation attacks, algorithms 
should be randomized. In the case of the RSA cryptosystem, there are basically 
two approaches for randomizing the computation of y = (mod N). This can 
be achieved by: 

1. randomizing the input data prior to executing the exponentiation algo- 
rithm [Koc96]; e.g., as 

a) X X + Ti N for a fc-bit random ri 

b) d <— d + T 2 for a /c-bit random T 2 

and then y is evaluated as y = y (mod N) with y = x'^ (mod 2’^N); 

2. randomizing the exponentiation algorithm itself (e.g., [Wal02], [MDS99]). 

The first approach, initiated by Kocher (see [Koc96, Section 10]), presents the 
advantage of being independent of the exponentiation algorithm. It also is worth 
noting that when x is the result of a probabilistic padding (e.g., OAEP [BR95] or 
PSS [BR96]), there is no need to further randomize x and so the exponentiation 
can, for example, be carried out as y = x‘^ (mod N) with d = d + T 2 <j>{N) for 
a random r 2 - Unfortunately, such a randomization of d is restricted to CRT 
implementations of RSA [QC82] as the value of Euler totient function (j){N) 
is usually unknown to the private exponentiation algorithm in standard (i.e., 
non-CRT) mode.^ 

The best representative of the second approach is the Mist algorithm by 
Walter [Wal02]. Mist randomly generates a fresh addition chain for exponent d 
for performing x'^ (mod N). To minimize the number of registers, the addition 
chain is computed on-the-fly via an adaptation of an exponentiation algorithm 
based on “division chains” [Wal98j. Another example is an improved version 
of the sliding window method proposed in [IYTT02] . Compared to the first 
approach, it allows to randomize the exponentiation without the knowledge of 
4>{N) but requires a secure division algorithm for computing the division chains 
or quite complicated management. 

This paper presents a novel method to randomize the execution of the ex- 
ponentiation, in order to prevent Differential Power Analysis (DPA) [KJJ99], 
combining the advantages of the two approaches: As in the first approach, it 
does not impose a particular exponentiation algorithm; and as in the second 
approach, it is a randomized algorithm (in particular, it does not require the 
knowledge of 4>{N) nor of e in a private RSA exponentiation). Our method in- 
troduces the concept of self-randomized exponentiation, meaning that exponent 
d is used itself as an additional source of randomness in the exponentiation 
process. Self-randomized exponentiation only assumes that exponent bits are 

^ When the public exponent e is known and not too large, one can randomize the 
private exponent as d <— d -|- r(ed — 1). Unfortunately, in most cases, e is unknown 
(i.e., not available to the private exponentiation algorithm). 
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scanned from the most significant position and so applies to most exponentia- 
tion algorithms [MvV97, Chapter 14]. It can also be combined with most other 
counter-measures such as randomizing the exponent prior to the exponentia- 
tion. Finally, our method is not restricted to exponentiation in RSA groups and 
equally applies to other groups such as the group of points of an elliptic curve 
over a finite field [Kob87,Mil86] . 

The rest of this paper is organized as follows. The next section briefly re- 
views exponentiation algorithms and presents the general principle behind self- 
randomized exponentiation. In Section 3, two different, self-randomized expo- 
nentiation algorithms (and variants thereof) are detailed. Section 4 presents 
equivalent versions but without branching instructions, so that Simple Power 
Analysis (SPA) [KJJ99] is also prevented. It also presents a version resisting 
against a powerful attacker able to “reverse” the exponentiation algorithm along 
with other further optimizations. Finally, Section 5 concludes the paper. 

2 Self-Randomized Exponentiation 

2.1 Classical Exponentiation Algorithms 

There exist two main families of exponentiation algorithms for evaluating the 
value of y = x‘^ (mod N), according to the direction the bits of exponent d 
are scanned. This paper is only concerned with left-to-right algorithms (i.e., 
scanning d from the most significant position to the least significant position), 
including the square-and-multiply algorithm and its fc-ary variants, the sliding- 
window algorithms, ... (see [MvV97, Chapter 14]). Left-to-right algorithms 
require fewer memory and allow the use of precomputed powers, x* (mod N), 
for speeding up the computation of y. 

2.2 General Principle 

Let d= {di,. . . , do )2 = (with di G {0, 1}) denote the binary represen- 

tation of exponent d. Defining 

dk—yj ■ — (dfc , . . . , dj (2 — ^ ^ di‘2 ^ , 
k>i'>j 

left-to-right exponentiation algorithms share the common feature that an ac- 
cumulator is used throughout the computation for storing the value of 
(mod N) for decreasing Fs until the accumulator contains the value of y = 
(mod N). 

For example, the square-and-multiply algorithm exploits the recurrence re- 
lation 

xd-l^i = • x‘‘‘' 

with = x‘^K Therefore, writing at iteration i the value of in accumu- 
lator i? 0 ) we obtain the algorithm of Fig. 1. 
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Input: x,d= {di,...,do )2 

Output: y = x‘^ (mod N) 

Rq i — I5 -Rl i — x; i ^ — I 

while {i > 0) do 

Ro Ro • Ro (mod N) 

if (di = 1) then Rq Ro • Ri (mod N) 

i •«— i — 1 

endwhile 
return Ro 



Fig. 1 . Square-and-multiply algorithm 



Building on the earlier works of [CJRR99,CJ01], we use an additive splitting 
of the form 

x‘^ = 

for a random a, as a means to mask exponent d. A straightforward application 
of this splitting is inefficient as it roughly doubles the running time: both x'^~°' 
and a;“ need to be computed. 

The main idea behind self-randomized exponentiation consists in taking (part 
of) d as a source of randomness. So, random a in the above splitting is chosen 
equal to for a random i, since the value of a;'*”*'* is available in the accumu- 
lator and needs not to be computed. There are various ways to apply this idea. 
The next sections present several realizations. 



3 Basic Algorithms 

3.1 First Algorithm 

Our first algorithm relies on the simple observation that, for any I > ij > 0, we 
have 

xd = xd‘^° 

= x^^^d‘^o-di^i-^)-di^i^)-di^i^) . x'd’-^'Q. ■ ■ ■ xd‘^'f 

If the ij’s are randomly chosen, the exponentiation process becomes proba- 
bilistic. A Boolean random variable p is used to determine whether or not the 
current loop index i belongs to the set {i\, . . . , f/}. If so, exponent d is replaced 
with d — di^iy This is illustrated in the next figure. 

As in the classical left-to-right exponentiation algorithms, a first accumula- 
tor, Ro, is used to keep the value of We also use a second accumulator, 
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Fig. 2. Masking of exponent d (I) 



i?i, to keep the value of riij>i ■ To ensure the correctness of the process, 
the randomization step d d— di^i- cannot modify the (I — ij + 1) most signif- 
icant bits of d (i.e., di^i^). This latter condition is guaranteed by checking that 
> di^i^ (see Fig. 2). 

Applied to the classical square-and-multiply algorithm, we get the following 
algorithm. 



Input: x,d^ {di,...,do )2 

Output: y = x’^ (mod N) 

Rq ^ — Ij Ri ^ — Ij R 2 ^ — X] i ^ — / 
while {i > 0) do 

Ro Ro ■ Ro (mod N) 

if {di = 1) then Ro ■(— Ro ■ R 2 (mod N) 

p ■<— H {0, 1} 

if ((/9 = 1) A (di- 1^0 > di^i)) then 

d i — d — di-i-i 

iti ■«— i?i • Ro (mod N) 

endif 

i i — 1 

endwhile 

i?o i?o ■ Ri (mod N) 

return Ro 



Fig. 3. Self-randomized square-and-multiply algorithm (I) 



Remark 1. In Fig. 3, as at iteration i = ij, the updating step, d d — di^i, 
does not modify the (/ — i -I- 1) most significant bits of d, it can be equivalently 
replaced with di_i_>.o ^ di_i_>.o — di^i. 



Analysis. We remark that the randomization step (i.e., d ^ d—di^i^) modifies 
the (/— *j-|-l) least significant bits of d. Furthermore, the “consistency” condition 
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(i.e., > di^i^) implies that only about the lower half of exponent d is 

randomized. For the RSA cryptosystem with small public exponent, this is not an 
issue since such a system leaks half the most significant bits of the corresponding 
private exponent d [Bon99, Section 4.5]. 



A simple variant. The previous methodology applies when the randomization 
step is generalized to: 

d-^ d- g- di^i. 

for some random g such that > g ■ di^i^. The second accumulator (say 

i?i, cf. Fig. 3) should then be updated accordingly as ^ i?i • i?o® (mod N). 
Of particular interest is the value g = 2'^ as the operation g ■ di^i^ amounts to 
a shifting and the evaluation of i?o® (mod N) amounts to t squarings. Again 
with the example of the square-and-multiply algorithm, we have: 



Input: x,d^ {di,...,do)2 

Output: y = x'^ (mod N) 

Rq i — I5 i — 1; R2 ^ — X] i i — I 

while {i > 0) do 

i?o Ro • Ro (mod N) 

if (di = 1) then Ro <— Ro ■ R2 (mod N) 

p {0, 1}; T •(— i? {0 , . . . , T} 

if ((p = 1) A (di-i^T > di^i)) then 

di — \—^T ^ di — \—^T d[—^i 

R3 — Ro 

while (r > 0) do 

R3 ■(— (mod A); r <— r — 1 

endwhile 

i?i i?i • i?3 (mod N) 

endif 

i i — 1 

endwhile 

Ro ■(— Ro ■ Ri (mod N) 

return Ro 



Fig. 4. Self-randomized square-and-multiply algorithm (!’) 



Note that, at iteration i = ij, the “consistency” condition di-i^o > 2” di^i 
is replaced with the more efficient test di-i^r > di^i and the updating step 
d ^ d— 2” di^i is replaced with di^r ^ di^r — di^i <t4> ^ di-i^r~di^i, 

as mentioned in Remark 1. 

Bound T should be chosen as the most appropriate trade-off between the 
randomization of the most significant bits of d and the efficiency in the evaluation 
of r squarings, for a r randomly drawn in {0, . . . , T}. 
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While it also randomizes the upper half of exponent d, the algorithm of Fig. 4 
requires an additional register for computing . The next section shows how 
to remove this drawback. 



3.2 Second Algorithm 

Our first algorithm (Fig. 3) only randomizes the lower half of exponent d as 
d d — di^i - ; the restriction coming from the “consistency” condition imposing 
a half-sized masking. In order to mask the whole value of d, we use the additional 
trick that 

for any ij > Cj >0. Actually, we successively apply the methodology of our first 
algorithm to sub-exponent di^i.-^y'^ Moreover, to avoid the use of additional 
registers, we only perform one randomization at a time. In other words, if we 
update exponent d as depicted in the next figure 



^ ij Cj 0 

d I d[ — ,2,- — 1 — ,2.,— c| 





1 di^ 


■ i T C T 1 




- 


di^u 1 


d 










' s 


/ ' 





d/ — ^ij—Cj dl — 



Fig. 5. Masking of exponent d (II) 



a new updating step of exponent d will only be permitted after the complete eval- 
uation of (mod N). A Boolean “semaphore”, a, keeps track whether 

updating is permitted or not. 

From Fig. 5, we observe that the {I — ij + 1) most significant bits of d (i.e., 
di^i ^ ) remain unchanged by the randomization step if 

{ di^-i^ij-cj > di^i- , 

(ij ~ 1) ~ (*i ~ Cj) ^ I — ij < — > Cj > I — ij + 1 . 

We set Cj = I — ij + 1 + Vj for some nonnegative integer Vj. Together with 
condition ij > Cj >Q, this implies 2ij > I + 1 + Vj. 

^ Our first algorithm corresponds to the case Cj = 
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Remark 2. If Vj is equal to 0, the “consistency” condition (i.e., > 

di^i- ) is satisfied half of time, approximating di--i^i.-c and di^i. as — + 

bit randoms. In other words, if Vj = 0, half of time randomization is possible. A 
larger value for Vj increases the success probability of the consistency condition 
(and thus of the randomization). On the other hand, it also reduces the possible 
counter indexes i satisfying the condition 2ij > I + 1 + Vj. 

Figure 6 presents the resulting algorithm corresponding to the square-and- 
multiply algorithm. For all j, the value of Vj is taken equal to 0 (and thus 
Cj = l — ij 



Input: (dj,...,do )2 

Output: y = x‘^ (mod N) 

i?o e- 1; 1; i ?2 a;; * i; c < 1; a <— 1 

while {i > 0) do 

Ro Ro ■ Ro (mod N) 

if (di = 1) then Ro ■(— Ro ■ R 2 (mod N) 

if ((2i > i 4- 1) A (ct = 1)) then c <— I — i + 1 [J] 

else a <— 0 

p <— H {0, 1} 

e ^ p A (dj— 1— fj— c ^ A c 

if (e = 1) then 

Ri i — Ro\ rr i — 0 

di — 1 — ii — c ^ di— 1 — yi — c dl—yl 

endif 

if (c = 0) then 

Ro Ro ■ Ri (mod A"); cr 1 

endif 

c<— c — 1; i— 1 

endwhile 

return Ro 



Fig. 6. Self-randomized square-and-multiply algorithm (II) 



4 Enhanced Algorithms 

4.1 Side- Channel Atomicity 

As presented in the previous section, our algorithms involve numerous branchings 
and so, although randomized, might be vulnerable to SPA-type attacks [KJJ99]. 

A generic yet efficient technique, called “side-channel atomicity” [CCJ], al- 
lows to remove branching conditions at negligible cost. As this is not the main 
subject of this paper and due to lack of space, we present hereafter, without any 
further explanation, an atomic version of our first algorithm (Fig. 3). An atomic 
version of our second algorithm (Fig. 6) can be found in Appendix A. 
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Input: {di,...,do)2 

Output: y = x‘^ (mod N) 

i?o 1; ^1 1; -R 2 a;; * fe •(— 0; £ = 0 

while {i > 0) do 

Re ■(— Ro ■ Re+2k (mod N) 
fc <— fc © (di A ->t) 
d i — d + di—ei — di^i X (1 + e) 
i i — (-ifc A -le) 

P <— H {0, 1} 

£ •(— p A -tk A “i£ A (di-1^0 > di 

endwhile 

Ro ■(— Ro ■ Ri (mod N) 

return 7?o 



Fig. 7. Atomic self-randomized square-and-multiply algorithm (I) 



4.2 Reversibility 

Throughout this section, we assume that our algorithms are given in a form free 
of conditional branchings (e.g., by using side-channel atomicity). We will now 
study their respective strengths against a very powerful imaginary adversary 
able to distinguish the performed (modular) multiplications. Algorithms I and 
II involve four types of multiplication: 

S : Rq Ro ■ Ro (mod N) 

M : Ro Ro' R 2 (mod N) 

Cl : t— i?o • Ri (mod N) 

C 2 : Ro ^ — Ro * Ri (mod N') 

according to the registers used for the multiplication. Provided that such an 
attacker makes no errors, Algorithms I and II can be reversed and the value of 
exponent d recovered. The reversing algorithms are presented in Fig. 8. 

We insist that the assumption of recovering the exact sequence of multiplica- 
tions is unrealistic for present-day cryptographic devices as they include various 
countermeasures to purposely prevent the distinction between S, A4 and C*. 
Even under such a strong attack scenario. Algorithm II can be slightly modified 
in order to make the attack impractical. 

Algorithm II (Fig. 6) is constructed by choosing parameter Vj = 0 for all j. 
In fact, parameter Vj can be any nonnegative integer such that 2ij > I + 1 + Vj 
(cf. Remark 2). Hence, the largest possible value for 12 j is 2ij — I — 1 and thus, 
since Vj > 0, parameter Cj = I — ij + 1 + Vj can take any value in the set 
{I — ij + 1, . . . ,ij}. We generalize our second algorithm by randomly picking Cj 
in the set {I — ij + 1, . . . , ij}; i.e., by replacing Line | in Fig. 6 by 

if ((2z > / + 1) A (cr = 1)) then c {I — i + l,i} 

Doing so we obtain a third algorithm (Algorithm III) . Its side-channel atomic 
version is fully given in Appendix A. 
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Input: L = (L;/, . . . ,Lo) 

Output: d 

M 0; d' 0; i Z' 

while {i > 0) do 
case 

(Li = 5) : d' ^ 2d' 

(L, =M): d' ^ d' -t 1 

{Li = Cl) : M M -|- d' 

endcase 
i i — 1 

endwhile 

return (m -I- d') 



Input: L = (L;/, . . . ,Lo) 

Output: d 

d' 0; jA— Z; i Z' 

while (i > 0) do 
case 

(Li = 5) ; d' ^ 2d' 

(L, = At ) : d' ^d' + I 

{Li = C 2 ) : d' ^d' + L»['+i+i] 

endcase 

if (Li_i = 5) then L»[j] d'; j j - 1 

i e- i — 1 

endwhile 

return d' 



(a) Algorithm I 



(b) Algorithm II 



Fig. 8. Recovering exponent d in self-randomized exponentiation algorithms by distin- 
guishing all the involved multiplications 



Provided that multiplications can be distinguished, reversing Algorithm III 
translates into the successful execution of the following algorithm: 



Input: L = (L;/, . . . ,Lo) 

Output: d 

d ^ 0 , j ^ Z, jotd ^ Z, % ^ Z 

while {i > 0) do 
case 

{Li = 5) : d' ^ 2d' 

(Li = At ) : d' ^ d' -b 1 

{Li = C 2 ) : for < jtry < joid, try d' ■(- d' + D[jtry]; joid j 

endcase 

if (Li_i = S) then D[j] -<r- d'\ jA— 4 — 1 
i i — 1 

endwhile 

return d' 



Fig. 9. Exhaustive search on Algorithm III 



Since the attacker does not know the random Cj chosen in the set {I — ij + 
she has to to try all possible values. Such a exhaustive rapidly becomes 
impractical, rendering our third algorithm even secure against very powerful 
adversaries. 
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4.3 Further Optimizations 

The frequency of appearance that Boolean variable p = 1 can be seen as a tuning 
parameter for choosing the best trade-off between performance and security: 
more randomization penalize the running time and fewer randomization eases 
the exhaustive search. 

A good way to lower the cost of additional operations consists in slightly 
modifying the random generator outputting p so that when Hamming weight 
of d — a {a may have several definitions according to Algorithm I, II, or III) is 
weaker than Hamming weight of d , p has a higher probability of being a 1 and 
conversely. By this trick, the self-randomized algorithm will tend to select the 
case which has the weakest Hamming weight, that is, the fastest branch. We note 
however that the algorithm cannot always select the fastest branch as otherwise 
it becomes deterministic and so is more easily reversible. 



4.4 Average Timing 

In the following, we give a table with complexity of different algorithms, in term 
of multiplications. 



Table 1. Average number of modular multiplications to perform an exponentiation of 
length 1024 



(S, jVl, Cl, C 2 


Square and Multiply 
naive random exp.® 


Our algorithms 
(I) (II) (III) 


Multiplications 


1536 1536 -f 96 


1536 -h 512 X p 1536 -f 10 1536 -h 10 



The overhead factor of Algorithms II and HI (10) corresponds in fact to an 
upper bound of log 2 d. This is a very small quantity but it provides an interesting 
entropy: the number of possible randomization for a given exponent is superior 
to ( -) > 

5 Conclusion 

This paper introduced the concept of self-randomized exponentiation as an ef- 
ficient means for preventing DPA-type attacks. Three different such algorithms 
(and some SPA-protected variants thereof) were described. 

Self-randomized exponentiation presents the following interesting properties: 

— it is fully generic in the sense that it is not restricted to a particular expo- 
nentiation algorithm; 

® By random exponent d, we mean the use of d as explained in the introduction, with 
a random r 2 of size 64 bits. 
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~ it is parameterizable: a parameter allows to choose the best trade-off between 
security and performance; 

— it can be combined with most other counter-measures; 

— it is space-efficient as only an additional long-integer register is required; 

— it is flexible in the sense that it does not rely on certain group properties; 

— it does not require the prior knowledge of the order of the group in which 
the exponentiation is performed. 

Of independent interest, the notion of reversibility in self-randomized expo- 
nentiation algorithms was defined and a concrete construction was given. 
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A Side-Channel Atomic Exponentiation Algorithms 



Input: x,d^ {di,...,do)2 

Output: y = x'^ (mod N) 

Ro i — I5 -Ri i — I5 R2 ^ — X] i i h i — 0^ € = 0 

cr 1; c < 1 

while {i > 0) do 

0 ^ (c = 0) 

Ro <— Ro ■ Re+2k (mod N) 

A: <— fc © (di A ~'0); i i — {~'k A —^0) 
cr <— (cr V 0) A (2i > / + 1) 
c <— -'cr(c — -'fc) + (Z — i + 1) X a 
p H {0, 1} 

e-(^pA-<kAaA (di_i_>i_c > di^i) 

cr •(— (T A -■£ 

di—i^i—c ^ — di—i^i—c + di^i — di^i X (1 + e) 
Rl i — R^e 

endwhile 

return Ro 



Fig. 10. Atomic self-randomized square-and-multiply algorithm (II) 



Input: x,d= {di,...,do)2 

Output: y = x‘^ (mod N) 

Ro ^ — 1 ? Rl ^ — I5 R2 ^ — X] i i — Z| h i — 0; € = 0 
(T 1; c < 1 

while {i > 0) do 

61 ^ (c = 0) 

Ro <— Ro ■ Re+2k (mod N) 

fc <— fc © (di A ~'d)', i i — (~'k A —< 6 ) 

(T (cr V 6) A ( 2 i > Z + 1 ) 

7 {Z - i + l,i} 

C A- -icr(c — -'fc) + 7 X CT 

p <— H { 0 , 1 } 

e-s— pA-ifcAnA (di_i_>i_c > d;_>i) 
cr <— cr A -le 

di — \—^i — c ^ dj_l — ^i — c + dl —¥i di^i X (1 + e) 

Rl ■«- R^£ 

endwhile 

return Ro 



Fig. 11. Atomic self-randomized square-and-multiply algorithm (III) 
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Abstract. This paper presents a scalable hardware implementation 
of both commonly used public key cryptosystems, RSA and Elliptic 
Curve Cryptosystem (ECC) on the same platform. The introduced 
hardware accelerator features a design which can be varied from very 
small (less than 20 Kgates) targeting wireless applications, up to a very 
big design (more than 100 Kgates) used for network security. In latter 
option it can include a few dedicated large number arithmetic units 
each of which is a systolic array performing the Montgomery Modular 
Multiplication (MMM). The bound on the Montgomery parameter 
has been optimized to facilitate more secure ECC point operations. 
Furthermore, we present a new possibility for CRT scheme which is less 
vulnerable to side-channel attacks. 

Keywords: FPGA design. Systolic array. Hardware implementation, 
RSA, ECC, Montgomery multiplication, Side-channel attacks 



1 Introduction 

Security of communication or in general of some digital data is founded by var- 
ious cryptographic algorithms. Especially implementations of Public Key Cryp- 
tography (PKC) present a challenge in vast majority of application platforms 
varying from software to hardware. Software platforms are cheap and a more flex- 
ible solution but it appears that only hardware implementations provide a suit- 
able level of security especially related to side-channel attacks. Two best known 
and most widely used public-key cryptosystems are RSA [26] and ECC [18], [13]. 
When it comes to RSA, it is believed to be on its “sunset” but still keeping up 
with requirements. Namely, because of various factors such as well developed 
speed-ups in the form of Chinese Remainder Theorem (CRT) techniques and 

* Lejla Batina and Siddika Berna Ors are funded by a research grants of the Katho- 
lieke Universiteit Leuven, Belgium. This work was supported by Concerted Research 
Action GOA-MEFISTO-666 of the Flemish Government and by the FWO “Identi- 
fication and Cryptography” project (G. 0141. 03). 



T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 250-263, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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its suitability for hardware, RSA is the main technology for high-speed appli- 
cations in network security, financing etc. On the other hand, ECC is expected 
to take the lead within wireless applications. The reason is that ECC operates 
with higher speed, lower power consumption and smaller certificates, which are 
all necessities within these areas including the smartcard industry. In short, it 
is mostly desired to develop an architecture which can efficiently perform both 
RSA and ECC, RSA for VPNs, banking etc. and ECC still mostly for wireless 
applications. 

Our contribution deals with an FPGA implementation of RSA and ECC 
cryptosystems over a field of prime characteristic. The architecture for Mont- 
gomery Modular Multiplication (MMM) used in this work is efficient and se- 
cure [22]. The systolic array is used for arbitrary precision in bits, hence easily 
bridging the gap between the bit-lengths for ECC from 160 bits to 2048 (or 
higher) bit long modulus for RSA. The notion of scalability we discuss includes 
both, freedom in choice of operand precision as well as adaptability to any de- 
sired gate complexity. To the latter is usually referred to as “flexibility” . We use 
modular exponentiation based on Montgomery’s method without any modular 
reduction achieving the best possible bound [29] , [3] . We are first to introduce a 
similar bound for ECC which allows us to perform a very secure and yet efficient 
point addition and doubling. We show that in the case of two or more arithmetic 
units a high level of parallelism can be achieved altering ECC operations between 
those units. The eventual parallelism between more units and also between cells 
of the systolic array is beneficial for side-channel resistance. Moreover, in this 
work we introduce a new variation of Garner’s scheme for CRT decryption, which 
has built-in countermeasure against timing and power analysis based attacks. 

Since the introduced architecture was dedicated to RSA applications, it was 
natural to implement elliptic curve arithmetic in GF{p). In this way all required 
components were already available as ECC in GF{p) is based on ordinary modu- 
lar arithmetic. Assuming one uses projective coordinates modular multiplication 
remains as the most time consuming operation for ECC. Hence, efficient im- 
plementation relies on efficient modular multiplication, as is the case for RSA. 
Nevertheless, it is also important to focus on time-constant algorithms which are 
less likely to leak side-channel information. To conclude, in this work we aimed 
to introduce a secure combined RSA-ECC implementation which as well meets 
high demands in speed implied by state of art for RSA hardware implementation. 
See for example [8]. 

The remainder of this paper is organized as follows. Section 2 gives a survey of 
previous implementations of public-key algorithms in hardware relevant for our 
work. In Section 3, we outline the architecture of the targeted implementation 
platform. Section 4 describes new options for point operations. In Section 5, 
the implementation results and timings are given. Section 6 introduces a new 
variant of Garner’s scheme for CRT which is as well efficient but more resistant 
to side-channel attacks. Implications of the proposed changes on security of both 
RSA and ECC are considered in Section 7. Sections 8 concludes the paper. 




252 L. Batina, G. Bmin-Muurling, and S.B. Ors 



2 Related Work 

This section reviews some of the most relevant previous work in hardware im- 
plementations for PKC. The vast majority of published work that is considering 
implementations of PKC deals with software platforms. Some of the work is done 
on FPGAs and only very few implementations are presenting an ASIC imple- 
mentation of ECC in the field of prime characteristic. Most of the work is done 
in binary field and some authors have considered dual field implementations i.e. 
ECC in prime and binary field. 

Goodman and Chandrakasan proposed a domain-specific reconfigurable cryp- 
tographic processor (DSRCP) in [8]. The DSRCP performs a variety of algo- 
rithms ranging from modular integer arithmetic to elliptic curve arithmetic. 
They mainly discussed the arithmetic in binary field. Most recent published 
work is the one of Satoh and Takano [27]. They presented the dual field mul- 
tiplier with the best performance so-far in both type of fields. The throughput 
of EC scalar multiplication is maximized by use of Montgomery multiplier and 
on-the-fly redundant binary converter. The great quality of their design is in 
scalability in operand size and also flexibility between speed and hardware area. 
Another hardware solution for both types of fields was presented by Wolkerstor- 
fer in [31]. The author introduced low power design which features short critical 
path to enable high clock frequencies. Most operations are executed within a 
single clock cycle and the redundant number representation was used. The idea 
of unified multiplier was first introduced by Sava§ et al. in [28]. The authors 
have discussed a scalable and unified architecture for a Montgomery multiplica- 
tion module. They deployed an array of word size processing units organized in a 
pipeline. The same idea is the basis of work in Grosschadl [9]. The bit-serial mul- 
tiplier which is introduced is performing multiplications in both types of fields. 
The author also modified the classical MSB-first version for iterative modular 
multiplication. All concepts are introduced in detail, but the actual VLSI im- 
plementation is not given. Some hardware implementations in GF(p) on FPGA 
are also known. The ECC-only processor over fields GF(p) was proposed by Or- 
lando and Paar [21]. They proposed so-called Elliptic Curve Processor (ECP) 
which is scalable in terms of area and speed. The ECP is also best suited for 
projective coordinates and it is using a new type of high-radix precomputation- 
based Montgomery multiplier. The scalability of the multiplier to larger fields 
was also verified in the field whose size is 521 bits. The authors have estimated 
eventual timing of 3 ms for computing one point multiplication in 192-bit prime 
field. Ors et al discussed an ECC-processor which is optimized for MMM in [23]. 
They described an efficient implementation of an elliptic curve processor over 
GF(p). The processor can be programmed to execute a modular multiplication, 
addition/subtraction, multiplicative inversion, EC point addition/doubling and 
multiplication. A detailed overview of hardware implementations for PKC is 
given in [4] . 

Still plenty of the work in ECC over GF(p) deals with software implemen- 
tations, where there exist many hardware implementations over binary field. It 
appears that the arithmetic in characteristic 2 is easier to implement and area 
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and power consumption are smaller than in the case of GF(p). This is believed 
to be true, but only for platforms where specialized arithmetic coprocessors for 
finite field arithmetic are not available. On the other hand, an advantage of prime 
field is in its suitability for both RSA and ECC with an a resourceful sharing of 
hardware. 

3 Previous Work and Background 

In this paper we discuss how an FPGA implementation of Montgomery multi- 
plication that was originally designed for RSA can efficiently be used to perform 
prime field ECC operations. This design consists of a Large Modular Mont- 
gomery Multiplier (MMM), designed as a systolic array. This array is one- 
dimensional and consists of a fixed number of Processing Cells (PCs) . The MMM 
performs Montgomery modular multiplication that consists of the following op- 
eration: Mont{X,Y) = XYR~^ mod N. 

In the remainder we call AR mod N the Montgomery representation of A. 
For modular exponentiation with the MMM all intermediate results are in this 
form. A number can be transformed to its Montgomery representation by per- 
forming a Montgomery multiplication of that number with R^ mod N. For the 
transformation from Montgomery representation to the normal form a Mont- 
gomery multiplication with 1 will suffice. 



3.1 Systolic Array 

Figure 1 shows a schematic of the systolic array that was implemented in the 
MMM. A PC contains adders and multipliers that can process a bits of X and f3 
bits of Y in one clock cycle. Here X and Y are the multiplicand and multiplier. 
Each PC calculates each clock cycle. The detailed description 

is given in Section 4.4. 




Fig. 1. Schematic of the Modular Montgomery Multiplier. 



In the original notation of Montgomery after each multiplication in the expo- 
nentiation algorithm a reduction was needed [19]. The input had the restriction 
X,Y < N and the output T was bounded by T < 2N. The result of this is that 
in the case T > N, N must be subtracted so that the output can be used as 
input of the next multiplication. To avoid this subtraction a bound for R can 
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be calculated such that for inputs X,Y < 2N also the output is bounded by 
T < 2N. 

In [3] the need of avoiding this reduction after each multiplication is ad- 
dressed. In practice this means that the output of the multiplication can be 
directly used as an input of the next Montgomery multiplication. The following 
theorem is proven in [3]. 

Theorem 1. The result of a Montgomery multiplication XYR~^ mod N < 2N 
when X,Y < 2N and R > 4iV. 

The final round in the modular exponentiation is the conversion to the integer 
domain, i.e. calculating the Montgomery multiplication of the last result and 1. 
The same arguments prove that this final step remains within the following 
bound: Mont{T, 1) < N. In practice, mod N = N will never occur since 
A^O. 

3.2 ECC Processor 

The MMM need not only be used for fast RSA implementation but also for 
ECC point operations in the prime field. Due to the scalability of the design, 
the FPGA architecture can perform both, i.e. efficient exponentiations on large 
operands (for RSA) and modular multiplication on the smaller ECC operands. In 
Figure 2 a schematic of an FPGA implementation for ECC is given. One or two 
MMMs are used to perform the modular (Montgomery) multiplications. A Large 
Number Co-Processor (LNCP) is added to the design to perform the additions 
and subtractions. These units have their own RAM’s and are connected with a 
data bus. 

As already explained, the performance of an elliptic curve cryptosystem is 
primarily determined by the efficient realization of the arithmetic operations 
(addition, multiplication and inversion) in the underlying finite field. If projec- 
tive coordinates are used the inversion operation becomes negligible. Therefore, 
coprocessors for elliptic curve cryptography are primarily designed to acceler- 
ate the field multiplication. Considering multiplication in the prime field i.e., 
GF(p), the whole work which is done for the RSA implementation is relevant. 
The only difference is that shorter bit-lengths are used i.e., 160-300 bits. Scalabil- 
ity is again a point of concern and even more inter operability between different 
implementations . 



4 New Implementation 

In this section we present our FPGA implementation for ECC point operations 
for prime fields. 
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Fig. 2. Schematic of the Modular Montgomery Multiplier. 



4.1 Point Addition 

Point addition and doubling can be performed according to the algorithm given 
in [5]. 

Here we assume that the two points that will be added i.e., P = (Ai, Yi, Zi) 
and Q = (A 2 , Y 2 j Z 2 ) are already transformed to the Projective coordinates and 
Montgomery representation. The result point R = P + Q = (A 3 , Y 3 , Z 3 ) 

Scheduling of point addition. Point addition can be even performed more 
efficient if two MMM units are used. The operations can be conveniently divided 
between the two units. (Modular) addition and subtraction will be done on a 
Large Number Co-processor. Those operations can be performed in the same 
time as the Montgomery multiplication. The following scheduling as shown in 
Table 1 can be used. Table 1 shows that the performance can almost be doubled 
by using two MMM units. 



4.2 Point Doubling 

Here we discuss a special case of point addition i.e. point doubling, where the 
points P and Q are respectively given as: P = (Ai,Yi,Zi) and R = 2P = 
(A3,Y3,^3). 

Scheduling of point doubling. In Table 2 a possible schedule for point 
doubling over the 2 MMMs and the LNCP is given . 

The difficulty in the scheduling of point doubling lies in the operations sched- 
uled in MMM2 and the LNCP, which are all depending on the answer of the 
previous operation. 
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Table 1. Scheduling of point addition. 
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Table 2. Scheduling of point doubling. 



MMMl 


MMM2 


LNCP 




ZF 




YF 






A2 = 4XiYF 


1 




00 

II 

CO 




Ai=3XG + oZG 


AG 




Z3 = 2YiZi 




Jf3 = AG-2A2 




A 2 — Ag 


Ai (A 2 — Ag) 








Yg — Ai (A 2 — Ag) — Ag 



Point multiplication can be implemented as a repeated combination of point 
addition and point doubling. 

4.3 Modular Addition and Subtraction 

Modular (i.e. Montgomery) multiplication, modular addition and modular sub- 
traction are the basic operations for point addition. MMM is performed on our 
highly scalable Montgomery based multiplier. Modular addition and modular 
subtraction can be implemented as a repeated addition. However, the number 
of additions/subtractions would be data dependent. Let us take a better look 
at these two operations. As proven in Section 3.1, the result of an operation on 
our multiplier will always be smaller than twice the modulus (2A). All modular 
additions and subtractions in the point addition scheme are with two outputs of 
the Montgomery multiplier. 

For example: 

Ai =Ai.^ 2 ^ ^ 2 p and A 2 = A 2 .^i^ < 2 p 

A3 =Ai — A2 modp ( 1 ) 

A 7 =Ai -I- A 2 modp 

The result of the modular addition and subtraction is again the input of an- 
other Montgomery multiplication and can therefore be larger than the modulus 
but should be positive. If it would be possible to calculate the previous calcula- 
tions as “normal” i.e. non-modular addition and subtraction, this would make 
the operations very efficient but more importantly time constant. 

Keeping in mind the “2p” bound for the operands as a result of the bound 
for the Montgomery parameter, we get: 

0 < Ai -I- A 2 < 4p — 1 
0 < Ai -I- 2p — A 2 < 4p 

Our target is now to try to fix a bound for the Montgomery parameter 
such that we can use these non-modular addition and subtraction instead of the 
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modular forms. To achieve this we must ensure that the inputs X and Y of the 
Montgomery multiplier that are smaller than Ap result in a Montgomery product 
that is smaller than 2p. 

As already mentioned, in the original implementation of our MMM the inputs 
of a Montgomery multiplication should be smaller than 2p. We will use the 
following lemma. 

Lemma 1. If the Montgomery parameter R satisfies the following inequality 
R > 16iV , then for inputs X,Y < AN the result T of the MMM will satisfy: 
T < 2N (as required). 

Proof: The Montgomery multiplication as implemented in the MMM calcu- 
lates the following: 



T = 



AB + mN 
R 



AB 

li 



R 



-N 



( 3 ) 



here m is calculated modulo R. Filling in the bounds for the inputs and R > 16N 
we get 



^ AB m AN ■ AN 

T = ^ N < ^ N < 2N. 

R R R 



( 4 ) 



If n is the length of modulus N in bits then the following is valid: < 2” 

and 16fV < 2”+^. With i? = 2’’, we get r > n -I- 4. □ 

We have shown that for all modulus lengths, inputs smaller than Ap will result 
after a Montgomery multiplication on the MMM in a value which is smaller than 
2p. Therefore we can use the more efficient and time constant implementation of 
modular addition and subtraction. Furthermore, there is no any loss in efficiency 
caused by this enlarged bound because R is usually already bigger than this 
bound (especially for a,fi > 1.) 



4.4 Montgomery Modular Multiplication 

The processing cells in the systolic array shown in Figure 1 performs Equation 5. 
Xi and mi have a bits, yj and Uj have j3 bits, and cOij- denote the carry 
chain on the array. Because the critical path of the systolic array is the same as 
the critical path of one PC, the clock frequency of the Montgomery multiplier 
will be the same for all bit-lengths. This property gives the advantage of using 
the circuit for RSA and ECC. 

2 X c\i^j ~\~ 2 X cOi^j ti^j — ti — Xi X yj mi X rij -t- 2 x — i -t- eOi^j — i (5) 

Parameters a and (3 are 4 for this implementation. Table 3 shows the performance 
of the FPGA implementation of the Montgomery multiplier. Parameter n is the 
bit-length of N, I in Figure 1 is 
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Table 3. The performance of the FPGA implementation of the Montgomery multiplier. 



Number of clock cycles 


3f +7 


Clock period 


19 ns 


Clock frequency 


53 MHz 


Total latency 


14.25n -f 133 ns 


Number of gates 


4547 



5 Results and Timings 



In the work of Lenstra and Verheul [16], the authors made a security comparison 
between RSA and ECC key lengths. They introduced a table that included 
corresponding key bit-lengths assuring minimal security in the years to come for 
the two Public Key systems. In Figure 3 the performances for ECC and RSA 
are given according to the key sizes that were given in their paper. The figures 
show also that especially for the future applications the performance of ECC is 
more attractive than the performance of RSA. 




— RSA with 1 MMM, - RSA with 2 MMMs, ECC with ECC with 1 MMM, - ECC with 2 MMMs 

1 MMM, .. ECC with 2 MMMs 

Fig. 4. Performance ECC with 1 or 2 
Fig. 3. Performance of RSA and ECC. MMM. 



Figure 4 shows the performance for an ECC implementation with one and 
two MMMs. The implementation with 2 MMMs is scheduled according to the 
schedule given in Table 1 and Table 2. Figure 4 shows a speed-up of a factor of 
2 for the two MMMs variant. 

For the sake of preciseness we give detailed performance results in the Table 4. 
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Table 4. RSA and ECC performance in ms at 53 MHz. 



year [16] 


1 RSA 


ECC 1 


size 


1 MMM 


2 MMM 


size 


1 MMM 


2 MMM 


2002 


1024 


22.8 


15.2 


139 


5.3 


3.1 


2014 


1536 


52.5 


35 


172 


8 


4.7 


2023 


2048 


90.6 


60.4 


197 


10.5 


6.1 


2051 


4096 


350.9 


234 


275 


11.4 


6.7 



6 Side-Channel Secnrity of CRT 



We will now briefly review some benefits of Montgomery’s Multiplication 
Method, which are also evident for CRT implementations. In [11,29], R > AN is 
proposed which, with some savings in hardware, omits completely all reduction 
steps. 

Especially implementations of CRT schemes are found to be very sensitive 
to side-channel attacks. For example, recently a new SPA-based attack was in- 
troduced by Novak [20], which is targeting the algorithm of Garner [17]. This 
scheme is often used in all sorts of applications, including smartcards. It is usu- 
ally implemented as follows: 



Algorithm 1. Garner’s algorithm for CRT 

INPUT: ciphertext c, N = p ■ q, {p > q) and precomputed values 

di = dmod {p — l),d 2 = dmod {q — 1) and U = p“^(modg) 

(Cl = C (modp), C 2 = C (mod ( 7 )) 

OUTPUT: R = M = t + q - {s ■ {q~^ mod p)) mod p 

1 . s = Mp = Ci'^^modp, 

2. t = Mq = C'2‘^’^mod(7 

3. a; = (s — t){modp) 

4. R = t + q ■ {{x ■ C/)(modp)) 



The third step is the critical one. Novak observed that if the modular subtrac- 
tion is implemented in the common way it may leak information. More precisely, 
to perform subtraction (mod p) one has to check the sign oi s — t and condition- 
ally add p if s — t < 0 {p > q is required) . Novak managed to build a successful 
attack based on this observation. An implementation of the above algorithm 
can produce the optional pattern in a power trace as a result of the conditional 
addition. 

We propose the following solution. Instead of the subtraction modp, one can 
compute the following: 



X = s + p — t. 



( 6 ) 
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For p > q the result stays within the following bounds 0 < x < 2p which 
can be handled easily if Step 4 is implemented by use of the algorithm of Mont- 
gomery. Namely, the algorithm as proposed in [11,29] for Montgomery modular 
multiplication takes two inputs 0 < X,Y < 2p and the result is also within 
the same interval, if the proper bound for Montgomery parameter R is chosen. 
This result is converted from the Montgomery domain to the usual domain by 
a Montgomery multiplication with 1. Changing Garner’s scheme in this way the 
algorithm is always performing a constant execution path. We prove this in more 
detail. 

Claim. The result ofa; = s + p — t is always smaller than 2p, for the parameters 
s,p,t defined as above, i.e. s = Mp = C'l'^^modp, t = Mq = C 2 ‘^’^modg and 
p> q. 

Proof: It is shown that with the use of Montgomery’s algorithm and R > 4iV, 
the final result of modular exponentiation is bounded by the modulus N. (See 
Theorem 1.) 

Now, we can prove the claim. It is obvious that 0 < s < p and 0 < t < q. We 
assume p > q with which this proof does not loose its generality, the other case 
is almost the same. Then we get x = s + p — t < p + p — 0 = 2p. Hence, if the 
multiplication in the Algorithm 1 is implemented as the one of Montgomery, no 
conditional subtraction is required as in original algorithm. This concludes the 
proof. □ 

7 Security Remarks 

In this section we address side-channel security i.e. resistance to timing [14], [10] 
and power analysis based attacks [15]. These types of attacks, together 
with fault-analysis based attacks [6], [12], [2] electromagnetic analysis attacks 
(EMA) [25], [7] and other physical attacks such as probing attacks [1] are a 
major concern especially for wireless applications. Mainly because of space lim- 
itation we only briefly discuss the first two, which are also believed to be the 
most practical . 

Namely, computations performed in non-constant time i.e. computations 
which are time-dependent on the values of the operands, may leak secret key 
information. This observation is the basis for timing attacks. On the other hand, 
power analysis based attacks use the fact that the power consumed at any par- 
ticular time during a cryptographic operation is related to the function being 
performed and data being processed. The attack can be usually performed easily 
because smartcards, for example receive the power externally and an attacker 
can easily get to hold on the source of this side-channel information. 

In our implementation all modular reductions are excluded. The weaknesses 
in the conditional statements of the algorithm (used for realization of the reduc- 
tion step) are time variations and therefore these should be omitted. By use of an 
optimal upper bound the number of iterations required in the algorithm based 
on Montgomery’s method of multiplication can be reduced [30] . Another timing 
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information leakage that was observed by Quisquater [10] et al. and Walter [30] 
was the timing difference between “square” and “multiply” . This information 
can be used to attack RSA, even advanced exponentiation methods were used. 
In our architecture, this weakness is removed, because the same systolic array is 
performing squarings and multiplications, which are therefore indistinguishable 
with respect to timing. 

Besides that, when considering power analysis attacks, some other precau- 
tions have also been introduced. The fact that all of the PCs operate in parallel 
makes these types of attacks far less likely to succeed. Both, RSA and ECC can 
benefit from this fact. 

As already mentioned, this architecture can be an option for wireless devices, 
although we have chosen here to introduce a network security devoted product. 
Again, because of space limitation we were not able to discuss the smaller, com- 
pact implementation but that also features very secure low-power design with 
attractive performances. 

Ors et al characterized the power consumption of a XILINX Virtex 800 FPGA 
in [24]. They showed that it is possible to draw conclusions about vulnerability 
of an ordinary ASIC in CMOS technology by performing power-analysis attacks 
on an FPGA-implementation. With respect to this, an FPGA design can serve 
as a good model for ASIC platform not just for usual hardware related properties 
but also for security. 

8 Conclusions 

We have presented the hardware implementation on systolic array architecture 
that is scalable in all parameters and ideally suitable for RSA and ECC algo- 
rithms. 

We have also introduced a bound on Montgomery parameter R, which allows 
us to perform the most efficient point addition and doubling for ECC, as well as 
modular exponentiation. Even in the case of CRT the Montgomery’s algorithm 
is proven to be the best option for side-channel resistance. 
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Abstract. Sedlak’s [Sed] modular multiplication algorithm is one of 
the first real silicon implementations to speed up the RSA signature 
generation [RSA] on a smartcard, cf. [DQ]. Theoretically, Sedlak’s 
algorithm needs on average n/3 steps (i.e., additions/subtractions) to 
compute the modular product of n-bit numbers. In [FS2] we presented 
a theoretical algorithm how to speed up Sedlak’s algorithm by an 
arbitrary integral factor i > 2, i.e., our new algorithm needs on average 
n/(3 • i) steps in order to compute the modular product of n-bit 
numbers. As an extension of [FS2] the present paper will show how 
this theoretical framework can be turned into a practical implementation. 

Keywords: Booth recoding. Computer arithmetic, Implementation is- 
sues, Sedlak’s algorithm, Modular multiplication. 



1 Introduction 

Without doubt it is clear that all of todays used public-key cryptography relies 
on modular arithmetic. Here, the most interesting operation is the modular mul- 
tiplication. Thus, fast algorithms/implementations of the modular multiplication 
have always been in the focus of cryptographic hardware investigations. This is 
witnessed by a tremendeous amount of literature on this constantly growing field 
of research, cf.[BA,Br,Gro,WQ,DJQ,Q,Mon,STK,Om,Wa]. 

Although Sedlak’s [Sed] modular multiplication algorithm was one of the 
first real silicon implementations it has never received a lot of scientific at- 
traction. While his original algorithm needs on average n/3 steps (i.e., addi- 
tions/subtractions) to compute the modular product of n-bit numbers, it was 
only very recently shown by [FS2] how to speed up Sedlak’s algorithm by an 
arbitrary integral factor i > 2. Theoretically, the new algorithm needs on av- 
erage only n/(3 • i) steps in order to compute the modular product of n-bit 
numbers. As a continuation of [FS2] the present paper will show how this theo- 
retical framework can be turned into a practical implementation. Thus, we will 
investigate all the subtle implementation issues to turn the former theoretical 
algorithm of [FS2] into a real-world algorithm. In addition to this “silicon-ready” 
implementation we will also present practical performance results and our silicon 
implementation. 

The paper is organized as follows. Section 2 recapitulates the results of [FS2] , 
i.e., we recall Booth’s algorithm, explain the ZDN-reduction, then we show how 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 264-277, 2004. 
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to merge both algorithms in order to get the simple as well as the double ZDN- 
based modular multiplication. Section 3, 4 and 5 will be concerned about the 
real life implementations of the two former algorithms. Section 6 will show the 
principle of the multiple ZDN-based modular multiplication. Finally in section 
7 we give practical results for the algorithms. 

2 Multiplication with Booth, the ZDN-Reduction, the 
Simple and Double ZDN-Based Modular Multiplication 

In the following, we give a short resume of the principles of the ZDN-based mod- 
ular multiplication as it was introduced in [FS2]. However, the reader is strongly 
encouraged to read [FS2] for more details. First, we recall the multiplication, 
reduction and its amalgamation to the modular multiplication. All algorithms 
in this chapter are purely mathematical! The variables in capital letters con- 
tain elements of Q, or more precisely, of lJjgj,j2“*Z. So, a division by 2 will not 
destroy information (which happens if one is working with physical registers of 
finite length). 

In this paper we will be concerned with the computation of (a • /3 mod ly) 
where 2"“^ <v <2^ and Q < a, (3 < v for some integer n of the size, e.g., 1024 
or 2048. Our starting point in [FS2] was the following simple and straightforward 
textbook algorithm Modular Multiplication /, see figure, where jdi denotes i-th 
bit of P, i.e., P = {Pn-i , . . . , Po )2 in binary representation. For our approach an 
equivalent variant Modular Multiplication II was used. 



input: 0 ,^, 1 / 




input: a,P,u 


output: -) ■.= a - j3 mod v 




output: 7 ;= a ■ 3 mod v 


Z :=0; C := a, N := u 




Z:=0; C:=a, N ■.= u 


for i := n — 1 dowiito 0 do 




for »:=«—! downto 0 do 


Z .= Z - 2 




C := C/2 


if pi = 1 then Z Z + C endif 




if 3i = 1 then Z := Z + C endif 


/* now Z = a - */ 




/*Z = a-(pn-u...,.3i)2-‘r"^'*/ 


endfor 




endfor 


/* nov! Z = a ■ 3, 0 < Z < u ■ 2’’ */ 




/* Z = a-3'i~", Q<Z <N*/ 


for i := ra — 1 downto 0 do 




for »;=« — ! downto 0 do 


if Z> N 2* then Z:=Z-N 2' 




Z:=Z-2 


endif 




if Z> A' then Z := Z - A’ endif 


/* now Z = a - 3 mod v ■ 2“ */ 




/*now Z = a - 3 ■ 2~‘ mod n */ 


endfor 




endfor 


return Z 




return Z 


Modular Multiplication 1. 




Modular Multiplication 11. 



To enhance the average shift value 1 (over the multiplier) of the multiplica- 
tion, shown in the former algorithms, the classical method of Booth [Bo] can 
be used. This algorithm is described in [Mac, Kor, Spa, Par] and achieves asymp- 
totically an average shift value of 3. It requires variable shifts and the ability 
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to subtract numbers. The method of Booth is based on representing numbers 
in the signed-digit notation SD2: Let f3 be an n-bit integer in binary notation. 
[3 = (/3„_i,... ,Po) 2 , be., P = Pi ■ 2*. Then, there exists a representa- 
tion of P in the form P = ■ 2* (also written as (Pn, ■ ■ ■ ,/3o)sd2) where 

Pi G {—1,0, 1}. Among these representations there is one with a minimal Ham- 
ming weight H(P). For these representations one knows about the expectation 
value E(iJ(/3)) = (n-|-l)/3. With the algorithms described in[Kor,Mac,JY] such a 
representation can be efficiently obtained on-the fly. By virtue of this representa- 
tion, one can define the two equivalent multiplication algorithms Multiplication I 
and Multiplication II: 



input: a,f) 
niitpiit; 7 := n ■ 0 
X := 0, C := a, m := -(- 1 

while m > 0 do 

LABooth(,£), km,izs, kv) 

Z-ZV 

Z-.= Z^vC 

/* now Z = a-{0„,..., 0m)sD2 */ 

endwhile 
return Z 

Multiplication I. 



input: a, ,3 

output: 7 := a ■ 0 

Z Q, C := a, m := -I- 1, c ;= 0 

while m > 0 do 

L.ABooth(i3, km, ka, kv) 
C-C-2-'’ 

Z:=Z + vC 

c ■■= c+ s /* =_n + 1 —_ni * j 

/* Z_-2^ =0.(/3n,...,^™)sD2 V 

endwhile 
return Z ■ 2P 

Multiplication II. 



Here, the subroutine LABooth(/3, &m, &s, &u) provides the shift value s, 
sign V (according to Booth) and current position m in the multiplier P for the 
arithmetic step. It manipulates the variables m, s and v (denoted by the &-sign). 

Note that E(iL(/3)) = {n+ l)/3 implies that E(s) = 3, which means that the 
above algorithm achieves asymptotically a performance factor of 3 compared to 
the simple multiplication. Or, one can say “one has an average shift value of 3”. 

Now, since the classical variable shift Booth algorithm achieves a speed-up 
of the factor 3 asymptotically, we have to provide a reduction algorithm of the 
same speed. This is accomplished by the so called ZDN algorithm {2/3N = 
“Zwei Drittel N” in german) which is based on the following lemma: 

Lemma 1. Let G N and C a real number with C G [—ir, v[. Furthermore, 
let s := G No U |oo} he the unique integer such that C • 2® G or 

C'2® G 7/^ = 0, we set S( := oo. Then (({• 2® — sign(^) • j^) G [— |, |[. 

If(: T2 — > ® uniformly distributed random variable, then the 

expectation value of Si^ is given by E(s^) = 3. 

This lemma immediately leads to the so called ZDN-reduction Reduction II 
which replaces the (somewhat) classical reduction algorithm Reduction I used 
in Modular Multiplication IT. 
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input: u with 0 < C < 2“^ ■ i' 

output: 7 := ^ mod u 

2 := C ■ 2-“, A' ;= :/ 
while c > 0 do 
2 ;= 2 • 2 

if 2 > N then Z Z — A' endif 
c := c — 1 

/* now 2 • 2'’ = (C mod u ■ 2 '' ) *j 

eiidwhile 

return 2 

Reduction I. 



input: 1/ with 0 < (^ < 2‘ • v 

output: 7 ;= C mod i' 

2 := C • 2 '*^, A' ;= v 
while c > 0 do 

hARed( 2 ,A',e,&s,&tj) 

2 := 2 • 2 “ + i' • N 
c:= c — s 

/* now 2 • 2*^ = C (mod u ■ 2 '') */ 
endwhile 

if 2 < 0 then Z := Z + N 
return 2 

Reduction II. 



The subroutine LARed(C, v, max, &s, &w) provides the algorithm with the 
shift value s and appropriate sign value v = — sign(^) according to the lemma. 

Then, the two former algorithms Multiplication II and Reduction II can be 
combined into one single algorithm. Starting with Modular Multiplication III 
(figure not shown here), a simple concatenation of the two algorithms, we merge 
the two occurring loops into one single one, resulting in Modular Multiplication 
IV. Finally, by reordering the loop we got Modular Multiplication V the final 
version of the ZDN-based modular multiplication algorithm. We also change the 
notation of the parameters to the obvious ones {sz ■= S 2 , sp := Si). 



input: a, , 3 , v 
output: 7 := a • ,3 mod v 
2 := 0 , C := a, A' := u, 
m ;= « -H 1, e: := 0 
while m > 0 or c > 0 do 
LABooth( 3 , k.si , icvi ) 
C;=C-2-"’ 

2 := 2 -H 
c ;= c + s: 

LARed( 2 , A', c, &S2, & 02 ) 

C := C • 2‘2 
2 := 2 • 2 *= + V2 ■ N 
c:= c — &2 

endwhile 

if 2 < 0 then 2 := 2 + N endif 
return 2 

Modular Multiplication IV. 



input: 

output: 7 ;= o ■ /3 mod v 
Z := 0 , C := a, A' := u, 
m := n + 1, c := 0 
while m > 0 or c > 0 do 
h.ARed( 2 , IV, c, Szsz,kv.v) 
LABooth( 3 , &:m, 
sc := Sz — 

C := C • 2*«^ 

2 := 2 • 2*2 + VC ■ C + v.^• ■ N 
c :=c — Sc 



endwhile 

if 2 < 0 then 2 := 2 + A' endif 
return 2 

Modular Multiplication V. 



The big advantage of this version is given by the following fact: One can 
now substitute the two single additions/subtractions Z := Z + vc ■ C and Z := 
Z ■ 2^ + vn ■ N by one single 3-operand addition Z := Z ■ 2^ + vc ■ C + vn ■ N . 
In the case of our architecture this extra-overhead is also negligible. All in all, 
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this results in a nearly doubled performance, since the two arithmetic operations 
which need at least 2 clock cycles are substituted by a one-clock-cycle operation. 

Remarks on behavior of the algorithm: The algorithm has an asymp- 
totically average shift value of 3 meaning that the number of loops is (n -I- l)/3. 
This would be obvious for the single LABooth, as E(s/ 3 ) = 3. However, LARed 
undergoes the technical constraint “max = c”. This especially means that the 
reduction is slowed down in the beginning, i.e., one cannot any longer assume 
that E(sz) = 3 holds in general. However, we can say E(s 2 ) — >■ 3 for n — >■ oo. 
This can be seen as follows. If E(s 2 ) = 3 — 6, for some i5 > 0, then c would be 
increased by 6 per loop. From a c, large enough, the slow down by max = c disap- 
pears implying again the result E(s,^) — >■ 3. In general, a modular multiplication 
will show the following behaviour (for large n) . In the beginning LABooth dom- 
inates, thereby increasing a little bit the value c. Then, LARed will also reach 
its full performance, i.e., c will go up and down pretty randomly but keeping its 
middle position. Since the Booth algorithm is already a bit ahead, it will reach 
the end of the multiplication before the reduction stops. LARed has to reduce 
the value c down to 0, thus finishing the algorithm. 

The development of the last algorithm was driven by the fact that it is 
much more efficient (in terms of speed) to merge two single 2-operand additions 
into one single 3-operand addition. This was accomplished by a slightly more 
complicated loop control structure. Of course, this idea can be continued in a 
natural (naive) fashion. Simply, merge two successive loop-iterations of Modular 
Multiplication V into one single loop-iteration. Although this will again increase 
the loop control structure, we start with this naive approach. Firstly, in Loop I 
(not shown here) we simply stick together two successive loop-iterations. For the 
following we use upper indices which are not exponents! The parts of the first 
resp. second loop are denoted with the upper indices 1 resp. 2. Although the 
second LABooth can be directly executed after the first LABooth (remember 
that it is independent of anything else) the situation with the second LARed 
is not so easy. Indeed, the second LARed depends on the result Z which is 
influenced by the first LARed. So, we have to do some pre-computation! This is 
shown in the first parameter of the second LARed in Loop LL. 

Starting to make things easier, the second LARed will not receive Z ■ 2“^ + 
v}^ ■ C + v],i ■ N as input, but simply the approximation Z • 2^^ -|- • iV. Here 

we assume, that the influence of C will be irrelevant in most cases. In some 
rare cases (when c is very small) a “wrong” reduction value can be delivered. 
However, the reduction value is not really wrong but only sub-optimal in terms 
of reduction speed. On the other side, we save the “shifting” step of Z by s\ for 
the next loop. This yields Modular Multiplication VL. 

Asymptotically this algorithm needs (n-|-l)/6 loop cycles in theory. This can 
be seen similarly to the last paragraph and will be explained in a full version of 
the paper. Clearly, the “small” side computation Z • 2‘^^ + ■ N needs to be 

done only on a small fraction of the leading register-bits. This is due to the fact 
that LABooth(Z, N, . . .) and LABooth(Z ■ 2^ , N ■ 2* , . . .) return the same values 
s and V for any t € N. 
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LARed(Z,.V,c, 

LABooth(;Ci; &m, &t'c) 

LARed{Z • 2"^ + ■ C + j;];, • N, 

A', c — sz + Sff,icsz,&cv%) 
LABooth(,£(, &m, &s|, feuf,.) 

:= «z - sji 
C:=C- 2'c 

Z := Z ■ 2*^^ + * C + vi^- • A' 

c := c — Sc 
2 2 2 
:= 

C ;= C ■ 2*c 

Z ;= Z ■ 2^Z + j;?; • C + v% ■ N 
c:= c — Sc 

Loop II. 



input: a,l3,i> 
output: 7 := a ■ 3 mod v 
Z := 0, C •■= a, N := v, 
m := n + 1, c := 0 
while m > 0 or c > 0 do 
LARed(Z, A,c,fcsz,&j;!^) 
LARed(Z -2^^= +vi- ■ N, N, 
c — Sz,&ZS,\ , &ivi{) 
LABooth(,'9, kin. k.s^,k,V(:) 
LABooth(,d, km, &s| , kvc) 

Sc := sz — si 
2 2 
:= -4 

c:=c- si; - s% 

C :=C ■ 

Z~Z 2“^ +f^-(C'-2-‘’c) 

AtjV * A A vc ‘ O + ■ {N ■ 2 

endwhile 

if Z < 0 then Z := Z + IV endif 
return Z 

Modular Multiplication VI. 



3 Implementing the Simple ZDN-Based Modular 
Multiplication 

Now we want to put the algorithm Modular Multiplication V into a computer 
architecture. Therefore, we will first summarize some of the specific properties 
of the algorithm which are determining the architecture: 

(i) Z takes positive and negative values, i.e., Z G [-N, N] in steps of 2“^ — for 
some k. (ii) C will be divided/multiplied by powers of 2. (iii) Z will only be 
multiplied by powers of 2. (iv) N doesn’t change at all. (v) There is a 3-operand 
addition, (vi) The multiplier f3 affects the computation only through the values 
Sj 3 and vc- 

These properties are mirrored by the following rough architecture description: 
There is one calculation unit and one control unit, (i) The calculation unit con- 
sists of 3 registers N, C and Z of bit length rl:= n+l + k for some k (say, k = 32). 
There is a virtual comma at the bit-position k such that the interval [—2”, 2”[ is 
realized in steps of 2“^ (in two’s complement representation), (ii) The register 
C can be shifted by the values sq G {—ShL, . . . , +ShL}. (iii) There is a shifter 
which latches Z into the adder by shift values G {0, . . . , ShL}. (iv) The reg- 
ister N has no additional features, (v) There is a realization of the 3-operand 
adder, (vi) The control unit holds information about the multiplier /3 and parts 
of Z and also delivers the control signals sz, sc,vc,vn for the adder. 

To stress again one point: The three registers of length rl = n+l + k will be 
used in a way such that they can contain numbers of the interval [— 2”,2”[ in 
steps of 2“^. In other words, if Z = {Zn+k, ■ ■ ■ , -^o): then usually the contents of 
Z is interpreted as the binary number (z„+fc, . . . , 20 ) 2 , at least modulo 2”+*+^. 
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input: a,^,n 




input: a,^,u 


output: 7 := a • mod u 




output: ')■•.= a ■ S mod v 


z := 0 , C := A' := u. 




Z ;= 0, C := a, N := )/, 


m := 71+1, c := 0, Isb rl — /i — 1®^ 




Zi — Z, N i — N 

(copy only the top ShL + 4 Bits) 
7/1 := n -(- 1 , (; := 0, hb := /i — 7 J — 1 


while m > 0 or c > 0 do 




while m > 0 or c > 0 do 


max := mm{ShL, 




max := nim{ShL^c} 


Ij.ARed(Z, .V, max, &.sz, kvN ) 




BARed(^, At max, fc.sz, ) 


if c = Sz and m > 0 then^^ 




if e < ShL + 4 and m > 0 then 


■<sz := max{ 0 , sz — 1) 




Sz := 0 


t',v := 0 




v,\ := 0 


endif 




endif 


max ;= min{lsb, ShL} sz^^ 




max := min{/s 6 , ShL) + sz 


L.ABoothfjS, max, fcm, icsg, 




LABooth(A9, max, &:m, ^vc) 


Sc ♦= Sx — Sfj 




so •= sz — 55 


c := c- sc 




c := e - sc 


Isb := Isb + 




Isb := Isb + SC 
Z < — Z (copy only the top 
2 ■ ShL + 4 Bits) 


C<— C«sc 




C i — C « sc 


Z — Z Sx vc ' (+ Vr\; * A’ 




Z < — Z <3C sz + t'c ■ 0 + T-'A" ■ A' 
Z i — Z<Csz + t’.v • A' 


endwhile 




endwhile 


if Z < 0 then Z < — Z + N endif 




if Z < 0 then Z < — Z + A" endif 


return Z 




return Z 


Mod7ilar Multiplication VII. 




Modular Multiplication VTII. 



However, here it is interpreted as the rational number (zn+k,--- ,^ 0)2 ■ 2~^, 
modulo 2”+^. 

Since the value in register C will be shifted up and down, the variable Isb will 
keep track of its actual position such that no bits will be lost. (Hence Isb has 
to lie between 0 and k.) The algorithm Modular Multiplication VII can be now 
provided. Note that we now use C-style notation for shifting, i.e., multiplication 
with 2. Further note that negative shift values are allowed so that C sc = 
C^—sc- Of course, some practical constraints are given: 

1. Finite register length. 

2. Finite shifter length. 

3. The 2-complement representation of numbers in Z, i.e., a number that is too 
big could be interpreted as a negative number. 

This leads to the algorithm Modular Multiplication VII. We will briefly com- 
ment on all of the additional technical issues which are marked by footnotes: 



1. The constraint in 1) is necessary due to finite shifter length ShL of Z. 

2. Due to finite shifter length the following situation is possible: the values 
which are returned by LARed are uat = 0 and sz so that (Z « sz) « 
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2/3N. In addition, we may have Sf} = I (which is the minimum during the 
multiplication phase) and = 1. This means for an a ^ N that the new 

value of Z is {Z « sz) + C + 0 ■ N 2/3 ■ N + 1/2 ■ N > N. All in all 

this could mean that Z would erroneously be negative in the 2-complement 
representation. Therefore, with 2) we decrement the sz by one, now gaining 
{Z«sz) + C + 0- N 1/3- N + 1/2- N < N. 

3. Because of the finite shifter length and the fact that C must not leave the 
register to the right, we have — min{Zs&, ShL} < sc(= sz — S/?) and therefore 
•S/3 ^ min{Zs&, ShL} + sz which is realized by virtue of the next point 4). 

4. The maximal output value Sf} of LABooth will be bounded by max in the 
obvious way. Therefore, we used the LABooth with an additional parameter: 

LABooth(/3, max, &m, &s, &u) 

LABooth(/3, &s, &u) 

if s > max then 

m < — m + s — max 
s < — max 
V i — 0 

endif 

5. The Isb controls the position of the a in the register C. We assume that the 
algorithm starts with an Isb greater than some constant threshold value k, 
e.g., 32 bits. 



Performance Analysis: We have seen before that this algorithm needs asymp- 
totically (n -I- l)/3 loop cycles in theory. One loop cycle contains one 3-operand 
addition. In a real hardware realization this can be performed on average dur- 
ing 1 clock cycle. However, in reality this doesn’t mean that the whole modular 
multiplication can be performed during (n + l)/3 clock cycles. Both functions 
LABooth and LARed have to be computed and especially LARed needs the re- 
sult of the previous 3-operand addition. That means that we have to spend at 
least one additional clock cycle for both functions LABooth and LARed, which 
leads to a two-cycle loop realization. All in all we would need on average about 
2n/3 cycles to perform one modular multiplication. Nevertheless, the next sec- 
tion shows how to perform both functions LABooth and LARed in parallel to a 
simultaneously running 3-operand addition. This means that LARed must use 
another main input for Z — as the currently performed 3-operand addition will 
be ready when LARed already has to deliver its values at the latest. 



4 An Enhanced Implementation 

It seems that the plausibility chain forces LARed to use the result of the previ- 
ously computed 3-operand addition. However, this dependence can be avoided 
when a good approximation of Z can be computed very fast in a way, derived 
from the following thoughts: 
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1 . As sz is anyway bounded by the maximal shift- value ShL, it suffices to know 
only the top (say ShL + 4) bits of Z in order to compute sz and vm within 
the LARed. 

2. Assuming that c has become large enough a few steps after starting the 

algorithm (e.g., if c > 2 • ShL + 4), the top most bits ofZ:=Z«sz-|- 
vc ■ C + vn ■ N can be approximated fairly good hy Z := Z s z + vn ■ N . 

And again moreover, this last computation needs to be performed only on 
the top most bits of Z. 

3. Such an addition can be computed in parallel to the “big” addition, but 
much faster than the latter one. 

4. Now, during the remaining time, i.e., finishing the “big” addition, LABooth 
and LARed can be completed. This saves one whole clock cycle at every step 
at the expense of using hardware parallelism together with a good approxi- 
mation of the currently evaluated Z. 

This leads to a fairly good description of a real existing hardware implementation 
of the ZDN-multiplication: Modular Multiplication VLLL. 

Considering this new algorithm, we see that we had to strengthen some con- 
straints after the LARed. The reason is that we can ignore the influence of C 
only if c is large enough, i.e., only in this case we will get a good approximation 
for Z . We simply suppress the reduction until c has become large enough. Note 
that the parallelism extends over two loops: While the “big” addition at the end 
of one loop is computed, the control functions of the next loop are evaluated! 

A further improvement deserves its own small section: 

Modulus Transformation: We have seen that computing the control values 
from LARed is the most expensive operation of the control part. As we have 
seen above we don’t use the full N , rather than a good approximation N , given 
by some of the top most bits of N . Thus, we can simplify LARed. 

Assume that the used top most bits of N , i.e., N will always have a particular 
simple and fixed form, which is independent of N . E.g., if = 0110 ... 0, then 
2/3 • N = 010 ... 0. However, that means that sz with 2/3 ■ N < {Z <K sz) < 
4/3 • N can be gained by simply counting the leading zeros of Z. A circuit for 
that purpose is rather small and fast. 

Note that computing with such modules v doesn’t mean any restriction. This 
is due to the fact that it is always possible and rather simple to find a very small 
number t, such that t ■ v has the desired form, cf. [DJQ,DQ,Q]. After computing 
the result modulo t - v {& full RSA exponentiation, cf.[RSA]) simply reduce this 
result modulo v. 

5 Implementing the Doubled ZDN-Based Modular 
Multiplication 

The manner in which the algorithm Modular Multiplication VL can be practically 
realized is very similar to that of the last two sections. Therefore we will fix a 
similar register architecture. 




High-Speed Modular Multiplication 273 



input: a,0,i> 


input: Q, 0, u 


output: 7 := o • mod f 


output: 7 ;= Q • /? mod 


Z ~ 0, C ■■= a, :V := u. 


Z :=0, C ■■= a, N i/, 

Z < — Z, N i — N (copy only the 
top ShL + ShL,\ -h 4 Bits) 


m := n + 1, r.:= 0, tsh := tI — n — 1 


m := n -1- 1, c ;= 0, Isb := rl — n — 1 


while m > 0 or c > 0 do 


while m > 0 or c > 0 do 


miix := min{c, ShL}^^ 


max := min{fr, ShL] 


L.\Red(Z, N, max, 


LARed(Z, N, max, ksz,kvh) 


if Sz = c and m > 0 thcn^l 


if c -(- sir < ShL + ShLsi + 2 and 
mou > 0 then 


sz ■■= max{0, sz — 1] 


Sz ■= 0 


Vfi := 0 


c'i\ := 0 


endif 


endif 

Z t — Z <3C Sz "h rir ■ X 


max := min{c — sz, ShL.\ } 


max := min{c — sz, ShL\ } 


L.\Red(Z <5C sz + w,v • N, 


LARixi(Z, A', max, ksn, k.v%) 


N. max, &.f ,v , ) 


if s v = c or s.v = 0 then 
Sz := 0 
v% := 0 
endif 
rriM := m 


minx := sz + miultiiA, 


max := sz + miu{ls(i, ShL] 


L.*\Booth(/?, max, &m, ks^, 


h.'\Booth(fl, max, km, ks^, kvc) 


»C := *z ~ •'*3 
c := c - sir 


Sc ■■= Sz - Sfl 


c := c - si; 


Isb := Isb + sir®' 


Isb := Isb + Sc 


max := min{/.s6, ShL, ShL + si;}®' 


max := min{ts6, ShL, ShL + si} 


T. .A Booth (/J, max, km, ks^, kv^)*^ 


L.ABooth(5, max, km, ks^, kvc) 


Sc -- -ss 


si; := -S3 


c := c — Sc 


o:=e-.si 


Isb - isi-l-si.®' 


Isb := Isb -(- Sc 
Z < — Z (copy only the top 
2 • ShL + ShL.\ + 4 Bits) 


C 4 — 0 sir "t" sir 


(7 t — C <5C si 4" sir 


Z i — Z Sz "t vir ’ (C7 si;)-|- 


Z i — Z « sz + t?i • (C » sir)+ 


+ Vc • C + t'.\ • ( J'V » sn ) 


■ A' -1- iri • C -1- ti^. • (A » spi) 
Z i — Z <3C Sz 4" vi • iV4" 

4-i.'.v • (.V»s.v) 


endwhile 


endwhile 


if Z < 0 then Z i — Z + A' endif 


if Z < 0 then Z i — Z 4- N endif 


return Z 


return Z 


Modular MulUplicatinn IX. 


Modular Multiplication X. 



The calculation unit consists again of 3 registers Z, C and N of bit length 
rl = n + I + k. They are used for the following operations. 



1. C can be shifted by the values —ShL, . . . , +ShL. 

2. A 5-operand addition with the following properties is realized: 

Z i — (Z « sz) + Vq ■ + vlf ■ N + Vq ■ C + v% ■ {N sn). Here, C 

and N will enter the addition twice. Each of C and N one time directly and 
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a second time latched through additional shifters, where N can be shifted 
down by 0, . . . , ShL^ bits. 

Plugging in this modification into the same framework as before, the following 
algorithm Modular Multiplication IX results. Again, we will now briefly comment 
the modifications which we have done: 

— 1) - 5) have the same reason as in the above section already explained. 

— 6) The shifter limitation is chosen in such a way that € [—ShL, ShL], 

Now, for Modular Multiplication X we want to parallelize again the “big” ad- 
dition and the computation of the look ahead values (via LARed and LABooth) 
for the next addition. Clearly, again some modifications have to be done on the 
above algorithm. 

6 The Multiple ZDN-Based Modular Multiplication 

The scheme which has been used for merging two successive loops into one single 
loop can clearly be generalized to merging i > 2 successive loops. As everything 
should be now clear we will only give a schematic sketch of the multiple ZDN 
algorithm (Cf. Fig. 1). This algorithm puts i successive loop additions into one 
single addition, this time using a (2 ■ i + 1 (-operand addition. This results in an 
average asymptotic shift value of (n -I- l)/(3f) bits per cycle. 

Of course, the generation of the increasing number of control signals will 
become more and more complicated. So this strategy only makes sense for very 
long registers, and it is only practicable for small i, e.g., i < 3. 

7 Practical Results 

Average shift values. As all theoretically derived results are only valid for 
n — >■ oo and unlimited shifters it is interesting to consider real practical results. 
However, our algorithms are a mixture between multiplication and reduction, 
where the reduction always runs a little bit behind the multiplication: One can 
multiply in advance, but one cannot reduce in advance! Thus, it is difficult to 
give average shift values for the algorithms. Both algorithms have their individual 
shift values. Although they could be estimated taking into account a given shifter 
length, their interplay is hard to estimate. 

Therefore, we define the average shift value as the quotient of n -I- 1 and the 
number of loops, i.e., the number of all necessary additions. From simulations 
we obtained the following shift values. 

These numbers (and some which are not shown here) tell us: 

— The average shift value does not depend noteworthy on the size of the k 
buffer bits, as long as A: > 20. 

— The average shift values come closer to the theoretical ones the longer the 
registers are. 

— The average shift values come closer to the theoretical ones the longer the 
shifters are. 
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Fig. 1. The multiple ZDN-based modular multiplication. 
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Silicon Realization. For the development of a new co-processor, capable to 
handle 2048 bit RSA, a study was done in order to compare two designs: 

1. Realization of the simple (z = 1) ZDN-algorithm Modular Multiplication VIII 
with n = 2048. 

2. Realization of the triple (z = 3) ZDN based modular multiplication, for n = 
1024 with some additional feature described in [FSl], called MultModDiv. 

Using the method described in [FSl] one needs 6 modular multiplications of 
length 1024 bit in order to emulate one modular multiplication of length 2048 bit. 
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Since in our architectures the complexity of a modular multiplication is linear 
in the bit length of the multiplier and because the algorithm in 2) is 3 times as 
fast as the one in 1) the performance (for RSA 2048) of these two architectures 
is (theoretical) comparable. 

The design 2) has the advantage that here RSA for 1024 bit length is three 
times as fast as in design 1). On the other hand 2) is more complex, both in 
the implementation of the hardware as well as in the implementation of the 
necessary software. 

Interestingly it turned out that both designs cost approximately the same 
silicon area: The design 2) needs a very sophisticated 7-operand-adder with more 
shifters than the 3-operand-adder used in 1). However in 2) only an adder of 
length 1024 bits has to be realized! All in all both calculation units have about 
the same size. In both cases, silicon area for generating the control structures 
are negligible compared to the adders. Finally, the design 1) was realized as a 
co-processor for chip card applications. 
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Abstract. A compact mesh architecture for supporting the relation col- 
lection step of the number field sieve is described. Differing from TWIRL, 
only isolated chips without inter-chip communication are used. Accord- 
ing to a preliminary analysis for 768-bit numbers, with a 0.13 /rm process 
one mesh-based device fits on a single chip of «(4.9 cm)^ — the largest 
proposed chips in the TWIRL cluster for 768-bit occupy «(6.7 cm)^. 

A 300 mm silicon wafer filled with the mesh-based devices is « 6.3 times 
slower than a wafer with TWIRL clusters, but due to the moderate chip 
size, lack of inter-chip communication, and the comparatively regular 
structure, from a practical point of view the mesh-based approach might 
be as attractive as TWIRL. 

Keywords: factorization, number field sieve, RSA 



1 Introduction 

Initiated by Bernstein’s paper [BerOl], in the last few years several proposals for 
speeding-up the linear algebra step of the number field sieve (NFS) by means of 
specialized hardware have been put forward. While Bernstein’s original proposal 
relied on the use of a parallel sorting algorithm, Lenstra et al. derived an im- 
proved mesh architecture that relies on a parallel routing algorithm [LSTT02]. 
Finally, in [GS03b] distributed variants of the proposals in [Ber01,LSTT02] are 
discussed where the main focus is on deriving a design that can be realized with 
current standard technology. In summary, building a device that performs the 
linear algebra step of the NFS for 768- or 1024-bit numbers within a few hours 
must be considered as doable with current technology. 

Using the words from [LSTT02], one can “conclude that from a practical 
standpoint, the security of RSA relies exclusively on the hardness of the relation 
collection step of the number field sieve.” Thus, it is no surprise that several 
attempts have been made to apply dedicated hardware to speed-up the relation 
collection step of the NFS, too. In particular, the TWINKLE device [Sha99,LS00] 
and the mesh-based design of [GS03a] can be seen in this context. However, none 
of these devices was practically capable of coping with the relation collection step 
of 1024-bit numbers. A significant step forward has been achieved by Shamir and 
Tromer [ST03] recently: the TWIRL device they describe could in principle com- 
plete the sieving part of the NFS for 1024-bit numbers in less than a year by 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 278-291, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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means of current technology. However, already for 768-bit numbers chip sizes of 
up to «(6.7 cm)^ (with an irregular layout) have been proposed. Although the 
proposed TWIRL parameters are not optimized for chip size, actually manufac- 
turing a TWIRL cluster seems extraordinary challenging. For 1024-bit numbers 
a TWIRL cluster including a wafer-sized silicon chip has been proposed; thus, 
from a practical point of view the question arises, whether it is possible to do 
with more regular and smaller chips. Also, avoiding inter-chip communication 
seems desirable. 

In this contribution we discuss a different design which is based on a routing 
mesh running at 500 Mhz (instead of 1 GHz in TWIRL, where the time-critical 
operations are simpler). For 768-bit numbers our proposal consists of a mesh 
of 256 X 256 almost identical processing units. The layout is rather regular and 
the estimated silicon area for a complete mesh is about (4.9 cm)^. Counting 
processing time per wafer, the estimated time needed for the sieving step with 
768-bit numbers is about 6.3 times higher than estimated in [ST03] for TWIRL. 
However, due to the simpler design, manufacturing the device and applying it 
at least to 768-bit does not seem unrealistic. For 1024-bit numbers we cannot 
give a reliable answer yet, but as the design presented here allows for a rather 
compact storage of the factor bases — a critical point in TWIRL — exploring the 
1024-bit case in more detail is certainly worthwhile. 

2 Preliminaries 

2.1 The Sieving Part of the NFS 

A standard reference for an introduction to the number field sieve is [LHWL93] . 
Here we only recall those aspects of the sieving step which are relevant for 
describing our device. 

In the first step of the NFS two univariate polynomials fi{x),f 2 {x) £ Z[x] 
are determined that share a common root m modulo n: 

fi(m) = f 2 (m) = 0 (mod n) 

Typically, fi(x) is of degree d > 5 and f 2 {x) is monic and linear (i.e., 
f 2 {x) = X — m). From these two polynomials the bivariate and homogeneous 
polynomials Fi{x,y), F 2 {x,y) £ Z[x,y] are derived via Fi{x,y) := y‘^ ■ fi{x/y) 
resp. F 2 {x,y) := y ■ f 2 {x/y). Now everything related to fi{x) resp. Fi(x,y) 
is said to belong to the algebraic side, and everything related to f 2 {x) resp. 
F 2 {x,y) is refered to as the rational side. In particular, for given smoothness 
bounds Bi,B 2 £ Nq the sets 

Fi := {(p, r) : fi(r) = 0 (mod p), p prime, p < R*, 0 < r < p} C (i = 1, 2) 

are refered to as algebraic and rational /actor base, respectively. Following [ST03], 
for the factorization of a 768-bit number, we assume B\ = 10® and i ?2 = 10®. 

Throughout the relation collection step, pairs of coprime integers (a, 6) G 
Z X N are to be found, such that the values Fi{a,b) and F 2 {a,b) are smooth. 
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This means that both Fi{a,b) and ^ 2 ( 0 , &) factor over the primes < Bi resp. 
< - 62 , except for a small number of prime factors; the precise number of ‘extra’ 
prime factors on the rational and algebraic side is not necessarily identical. The 
actual computation of (a, 6 ) -pairs where both Fi{a,b) and ^ 2 ( 0 , 6 ) are smooth 
can be performed by sieving over a rectangular region —A <a< A, 0<b<B 
where A, B G N. For organizing this sieving, different techniques are available; 
here we focus on so-called line sieving which is outlined in Figure 1. At this 
the threshold bounds correspond to the bitlength of the remaining cofactor 
on the algebraic resp. rational side. These bounds have to be updated several 
times throughout the sieving. In an actual implementation the values log 2 (p) are 
usually replaced by an integer approximation. Also the use of base 2-logarithms 
is not mandatory; in analogy to [ST03], subsequently we use a 10-bit counter 
for summing up approximations |"log^(p)] . Finally, note that in the last step 
of the main loop in Figure 1 it is computationally too expensive to identify the 
factors of Fi(a, b) through a simple trial-division by the primes in the respective 
factor base. To cope with this problem the sieving mesh described below reports 
prime factors of Fi{a,b) that have been found. 



2.2 Clockwise Transposition Routing 

An important algorithmic tool used in the sieving device described below is 
a modification of a fast parallel routing algorithm described in [LSTT02] in 
the context of fast matrix-vector multiplication. We start by recalling the main 
ingredients of this clockwise transposition routing: 

— the hardware platform is a mesh of (rather simple) processing units where 
each unit is connected to its horizontal and vertical neighbours. 

— In each step of the algorithm a processing unit holds no more than one packet 
that is to be routed; only one packet can be sent and received per step. 

— At the beginning of the algorithm some mesh nodes contain a data packet 
(the other nodes contain a nil value). Along with each data packet the coor- 
dinates of a processing unit in the mesh, the so-called target node, are stored, 
and the goal of the algorithm is to route all packets to the respective target. 



b-^0 

repeat 

6 ■«— 6 -f 1 
for i -<r- [ 1 , 2 ] 

Si(a) 0 (Va : —A < a < A) 
for (p, r) -s— Pi 

Si{br + kp) Si{br + kp) + log 2 (p) (Vfc : —A < br + kp < A) 
for a {—A < a < A : gcd(a, &) = 1, Si(a) > Ti, and 82(0) > T2} 
check if both Fi{a, b) and F2{a, b) are smooth 
until enough pairs (a, b) with both Fi{a, b) and ^2(0, b) smooth are found 



Fig. 1 . Line sieving 
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~ The actual routing is done by repeating the following four steps until the 
mesh is ‘empty’, i.e., all packets have reached their target node (where a 
packet is removed from the mesh): 

1. Each node located on an odd row compares its packet with the node 
above it (if any). The packets are exchanged if and only if the distance- 
to-target of the non-nil value that vertically is farthest away from its 
target node is reduced in this way. 

2. Each node located on an odd column compares its packet with the node 
to its right (if any) . The packets are exchanged if and only if the distance- 
to-target of the non-nil value that horizontally is farthest away from its 
target node is reduced in this way. 

3. Identical to the first step with ‘above’ replaced by ‘below’. 

4. Identical to the second step with ‘right’ replaced by ‘left’. 

A theoretical analysis of this algorithm is still lacking, but experimental results 
demonstrate its efficiency: e. g., for the situation considered in [LSTT02], the 
running time of the algorithm did not exceed 2M steps, when dealing with an 
M X M mesh. In the application below, no ‘packet-cancellation’ (see [LSTT02]) 
is used and several parts of the original algorithm have been modified. Again, 
simulations indicate that the resulting algorithm is rather efficient. However, 
analogously as in [LSTT02], we cannot provide a theoretical analysis of our 
approach at the moment. 

Although we never encountered an infinite loop in our simulations, it can in 
principle even happen that our routing algorithm for certain parameter choices 
does not terminate. But this is of no practical concern: in the sieving procedure 
described later, only a certain period of time is alloted for sieving a particular 
range of numbers, and in the (rare) case that the routing cannot be completed 
within this time limit, say due to an infinite loop, only some (a, &)-candidates of 
that sieving interval are lost. In contrast to the linear algebra step of the NFS, 
where an incorrect intermediate result can have devastating consequences, the 
sieving process is rather robust with respect to such errors. 

3 Adapting the Routing Algorithm 

Subsequently, we will deal with a mesh of size M x M = 2™ x 2™; for 768-bit 
numbers we can think of m = 8. Consequently, the largest distance a packet may 
have to travel during the routing is 2 • (M — 1) = 2M — 2. As a first modification, 
we want to ‘connect’ the borders of the mesh to get a torus topology and thereby 
reduce this maximal distance to 2 • (M/2) = M. 

3.1 Using a Torus Topology 

Having in mind an actual mesh of processing units, it is not desirable to install 
physical connections between the horizontal resp. vertical borders of the mesh, 
as the wires used for the ‘wrap around’ had to cross a length of at least M — 2 
processing units. Instead, a standard trick can be used to derive a layout where 
connecting wires never cross more than one processing unit: 
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1. The 2™ X 2™ processing units are arranged in a square, and we denote the 

column in this square by c, (1 < i < 2™): ci|c 2 |c 3 |c 4 | . . . |c 2 »»- 2 |c 2 ™-i|c 2 m 

2. Reversing the order of the columns C 2 m-i+i, . . . , € 2 ^ followed by applying a 
perfect shuffle yields the desired column positions: 

Cl|C2m|C2|C2m_l| . . . |C2in-l+2|C2m-l |C2m-l+l 

3. For implementing the vertical wrap around now the same trick is applied to 
the rows of the resulting arrangement. 

This rearrangement of the processing units is ‘just’ for the ease of implementation 
and helps to circumvent the handling of extremely unbalanced running times of 
signals. Thus, in the sequel when discussing algorithmic aspects and the labeling 
of processing units used in a computation we can ignore this implementation 
detail and think of an ordinary mesh architecture with wrapped-around borders. 

3.2 Finding the Route to a Target Node 

Each sieving line is split into subintervals of length S, and the mesh will process 
these subintervals one by one. For 768-bit numbers, we can think of S' = 2^'^, thus 
in a mesh of size 256 x 256 each processing unit will be in charge of 256 = 
consecutive sieve positions, and we focus on this parameter choice. 

When preparing a packet that is to be routed in our sieving procedure, we are 
given a 24-bit number r that represents a non-negative integer 0 < r < S, and 
we use only this value to identify the corresponding packet’s route to its target 
node. W.l. og. we can choose the start sq of the sieving subinterval such that 
256 I So, i. e., once the packet containing r arrived at the processing unit that 
processes the range {so-kt -256, . . . , sq - k i • 256 -k 255} (with t G {0, . . . , 2^® — 1}), 
the least-significant 8 bit of r determine which of the sieve positions of that 
processing unit has to be addressed. Thus, we want to interprete the 16-bit 
number i, that indicates the number of the processed subinterval, as (x, y)- 
coordinate (0 < x, y < 2® — 1) of the target node. 

There are various possibilities to encode these coordinates in the remaining 
24 — 8 = 16 bit of r. For our purposes the following approach is useful: we 
store the x-coordinate of the target node in the odd-numbered bit positions 
(23, 21, 19, 17, . . . , 9, where bit no. 23 is the most-significant bit) and the y- 
coordinate in the even bit positions (22, 20, 18, 16, . . . , 8). This interleaving of the 
coordinates can also be interpreted as a Kronecker/tensor product: the leading 
two bit of r determine in which 2"^ x 2^- (sub) quadrant of the 2® x 2®-mesh the 
target node is located. Similarly, the next two bit determine which (sub)quadrant 
(of size 2® X 2®) inside the 2^ x 2^-quadrant has to be addressed, etc. To get a 
better image of the resulting pattern. Figure 2 sketches for small values of i to 
which processing unit the sieving range {sq + i ■ 256, . . . , sq + * • 256 -k 255} is 
assigned; e. g., the processing unit at the left border of the third line in the mesh 
handles the 256 values {sq + 4 • 256, . . . , Sq -k 4 • 256 -k 255}. 

Given an r-value we can extract the x-coordinate Xt and the j/-coordinate 
yt of the respective target unit by reading off the odd- resp. even-numbered 
bit positions. While in the original clockwise transposition routing as described 
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Fig. 2. Assigning sieving ranges to processing units 

in [LSTT02] storing the target coordinates in each packet is sufficient for deciding 
efficiently when two packets have two be exchanged, here we also have to take 
care of the wrapped around borders. For this, we provide one extra bit along each 
axis which is set if and only if the packet wants to ‘cross the border’ for reaching 
its target. In other words, for a node with coordinates (xq, yo) in a 2™ x 2’”-mesh, 
the ‘horizontal cross border bit’ is set if and only if \xt — xo| > 2’”“^. Thus, with 
each package that is to be routed we store the two (m-bit) target coordinates 
(xt,yt) plus the two ‘cross border flags’. Using an 8-bit comparer and a simple 
circuit that deals with the cross border flags and the most significant bit of the 
target coordinates, we can easily decide if two adjacent nodes have to exchange 
their packets. Analogously as in [LSTT02] we assume that no more than one 
clock cycle is to be used for such a compare/exchange operation. 



3.3 Refilling the Mesh while Routing 

Experimentally it turns out, that the routing algorithm performs better if the 
number of ‘travelling’ packets is not too high. In our application, a processing 
unit usually has several packets that have to be output on the mesh for being 
routed to the respective target node. In principle, a processing unit can release 
a new packet whenever no other packet has to be stored. However, to avoid a 
slow-down through congestion of the mesh, in our experiments it turned out to 
be more efficient to release a new packet only if in the previous two clock cycles 
no packet was stored in that node. In this way usually « 25% of the processing 
units are ‘free’, which — experimentally — allows for a quite efficient routing. Each 
routed packet represents a divisor of some number in the currently processed 
sieving range, and the next section explains this connection in more detail. 

4 Organizing the Sieving 

The basic organization of the sieving process is identical as in [GS03a], in par- 
ticular we use line sieving, and when changing to a new line, i. e., increasing the 
current 6-value, new data has to be loaded into the mesh. For sieving one line 
—A < a < A of (« 3.4 • 10^^) a-values, only local operations inside the mesh 
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nodes are used. As indicated above already, a sieving line is not processed ‘as one 
piece’, but divided into consecutive intervals of length S{= 2^^). Before going 
into the details of how these subintervals are processed, we have to say how the 
two factor bases are represented in the mesh — at this we want to exploit the 
Kronecker/tensor product arrangement explained in Section 3.2. 



4.1 Storing the Factor Bases in the Mesh 

Each processing unit stores^ all elements {p, r) of the factor bases where the 
prime p is smaller than the size of the subinterval of the current sieving line 
processed by that processor — with the mentioned parameters for 768-bit num- 
bers this translates to the condition p < 2®. More precisely, for p < 2® 
and (p, r) being contained in a factor base, each processing stores the value 
(p, (sq + ■'’ + 2® • i) modp) where Sq is the first value in the processed sieving 
range of length S, and i € {0, . . . , 2^® — 1} is the number of the subinterval of 
length 256 processed by that unit. The idea here is, that a processing unit will 
be able to test locally which ‘tiny’ primes divide an element in the sieving range 
processed in that node. Next, all pairs (p, r) with primes p that are ‘up to 4 times 
larger’ — namely with 2® < p < 2^° — are stored ‘once per 2 x 2-square’ of the 
mesh. With the numbering from Section 3.2, this means that the prime p (along 
with the corresponding (sq + r + 2^'^ ■ i) mod p- values) is stored in all processing 
units where the ‘least significant tensor coordinates’ — bits no. 0 and 1 in the 
binary representation of the number i of the subinterval of length 256 — coincide. 
Again, the idea is to allow for a ‘local’ handling of prime divisors: a subquadrant 
of size 2x2 covers a sieving range of 2^-2® = 2^^^ numbers, and all prime divisors 
of size < 2^° are available inside that square. 

Next, we proceed analogously for submeshes of size 4x4 and primes 2^° < 
p < 2^^. In other words, all processing units where the bits no. 0-3 of the 
binary representation of the number i of the processed subinterval coincide, 
store the same pairs (p, r) — where in analogy to the above r is replaced by 
(so + ^ + 2^^ • i) mod p. So in each 4x4 subquadrant — which corresponds to a 
sieving range of length 4^ • 2® = 2^^ all primes < 2^^ are ‘available’. Going on in 
this way, in each submesh of size 8 x 8 we store the pairs (p, r) with 2^^ < p < 2^^, 
in each submesh of size 16 x 16 we take care of the primes 2^^ < p < 2^®, in 
each submesh of size 32 x 32 we deal with the primes 2^® < p < 2^®, and in each 
submesh of size 64x 64 we store the pairs (p, r) with 2^® < p < 2^®. All pairs (p, r) 
with p > 2^® are stored only once in the mesh. We do not consider subquadrants 
of size 128 x 128: due to the underlying torus topology, the horizontal or vertical 
distance between two nodes cannot be larger than 128 anyway. 

For an actual implementation, the question of how to store the pairs (p, r) 
is crucial: with smoothness bounds Bi = 10® and B 2 = 10®, the rational factor 
base contains 5,761,456 pairs (p, r) in total, and the algebraic factor base can 
be assumed to consist of « 50, 850, 000 pairs. With the prime distribution just 
described and leaving some leeway for multiple prime factors, we conclude that 

^ For the moment, we postpone a discussion of how to store these pairs. 
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each node in a 256 x 256 mesh has to store « 1,300 pairs (p,r). Storing them 
as pairs of 30-bit numbers in DRAM is extraordinary space/ area-consuming, 
and not suitable for our purposes. Thus, before going into algorithmic details 
we should clarify how to store these pairs more efficiently: for storing its factor 
base elements, each node is equipped with three rectangular blocks of DRAM, 
where one DRAM block can be accessed in ‘words’ of 28 bit, one block can be 
accessed in ‘words’ of 31 bit, and the other block allows for access in ‘words’ of 
7 bit. Within each block, sequential access is sufficient — random access is not 
required. The usage of the memory blocks depends on the size of the processed 
primes p: in analogy to [ST03] , we call p 

- tiny, if 2 < p < 2^^, - largish, if S < p < B 2 - 

— smallish, if 2^^ <p<S — hugish, if p > i ?2 

For reasons of efficiency, with each tiny or smallish prime we also store the 
(non-negative) values S modp and |"log^(p)]. The details of the encoding used 
for storing the four different ‘prime types’ efficiently are given in [GS03c, Ap- 
pendix A] . 

4.2 Sieving a Subinterval 

Let a be an arbitrary number from a subinterval of length S = 2^"^ and (p, r) G Pi 
an element of a factor base. Then |"log^(p)] is added to the ‘length counter’ 
Si{d) during line sieving if and only if a = 6r (mod p) (i = 1,2). When the 
factor bases have been loaded into the mesh as described, the mesh is prepared 
to sieve the first subinterval — A < a < — A-l-5' with 6=1. Each processing unit 
is in charge of 256 a- values (see Section 3.2), and conceptually splits into three 
parts: 

The main part contains the DRAM with the stored factor bases along with the 
necessary logic to read out these elements. In particular, this logic is in charge of 
a flag which indicates whether currently the unique rational root (which is always 
stored first) or an algebraic root is processed. Also the 6-bit representation of 
the current [log ^(p)] -value is stored in this part of the processing unit. 

After having retrieved an r-value from the DRAM, first we check whether it 
is ‘relevant’ for the current sieving subinterval of length S: as primes are stored 
repeatedly in the mesh, this ‘relevance’ does not depend only on r, but also on the 
size of p. More precisely, for p < 256 all values r < 256 have to be considered, for 
256 < p < 1024 all values r < 1024 are relevant, etc. For checking this ‘relevance 
condition’ efficiently we can make use of several OR gates that check the leading 
bits of r and a chain of multiplexers that is controlled by [log^(p)] . If this r- 
value turns out to be not relevant, then we replace the old r-value in the DRAM 
by the new 



( r — {S mod p) , if r — (S' mod p) > 0 

( r — (S mod p) -I- p , otherwise 

In this way, r is ‘shifted’ in the next subinterval of length S (cf. [GS03a]). Note 
that the value S mod p is known already (for tiny and smallish primes it is stored 
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in the DRAM and for largish and hugish primes we have S = S mod p), so this 
computation can be implemented efficiently by means of a 30-bit adder. Now 
the next root or prime can be processed. 

On the other hand, if an r-value is identified as relevant, then the {xt,yt)- 
coordinates of the node that is in charge of this value are determined by append- 
ing the corresponding odd/even bit positions of r to the respective number of 
most significant bits of the node’s own horizontal/vertical coordinate. In anal- 
ogy to the relevance test, the precise number of bits that are to be copied from 
the node’s own coordinates depends on p. Then the ‘cross border’ flags c^, Cy 
are determined; for doing so, we may either use general adders with two (8-bit) 
inputs or an optimized component that can check whether the horizontal resp. 
vertical coordinate of the current node differs from Xt resp. yt by more than 128. 
If the coordinates {xt, yt) are not identical with the node’s own coordinates, then 
(xt,yt), (cx,Cy), a ‘footprint’ of p, |"log^(p)] (6 bit), the least significant 8 bit 
of r, and the flag which indicates whether the prime belongs to the algebraic or 
rational side are written into an output buffer which will be read by — but is not 
part of — the mesh part of the node (see below). What does ‘footprint’ of p mean? 
For promising (a, &)-pairs this footprint will be output to the processor that is in 
charge of the post-processing of the candidate pairs; for primes larger than some 
predetermined bound Bf, say Bf = 2^^, it should be possible to recover p from 
the footprint. In principle we could send the complete value p up to the least 
significant bit here, however to save some space a different footprint is prefer- 
able: we send the coordinates of the current node (2 • 8 bit) concatenated with 
the bits no. 1-10 of p (i. e., the 10 least significant bits after dropping bit no. 0 
which is always set). As each processing unit stores only « 850 prime numbers 
larger than Bf = 2^^, this determines p in most cases uniquely. If a processing 
unit contains more than one prime with this footprint, in the postprocessing all 
primes with this footprint have to be taken into account. In summary, we write 
Xt, yt (8 bit each), Cx, Cy (1 bit each), |'log^(p)] (6 bit), the footprint (26 bit), 
the least significant 8 bits of r, and a one bit flag that distinguishes between 
the algebraic and the rational side into a 59-bit output buffer which will be read 
by the mesh part of the node (see below). According to our experiments, it is 
sufficient to provide space for two 59-bit entries in the output buffer; for storing 
the buffer entries, we can use latches which require only 4 transistors per bit.^ 

Now the currently processed prime p is added to the current r-value (with a 
30-bit adder needing no more than 2 clock cycles), and we have to check as above 
whether the resulting new value r := r + p is also ‘relevant’ for the processed 
sieving subinterval of length S. If this is not the case, then we update the old 
r-value in the DRAM accordingly for the next sieving subinterval, and otherwise 
we determine another (xt, j/t)-pair. 

Once an (xt, j/t)-pair with the nodes own coordinates is encountered, the re- 
spective r-value has to be handled by the node itself and thus the least significant 



^ Actually, we could do with fewer bits: as the node’s own coordinates (which are 
part of the 26-bit-footprint) are identical for all buffer entries, we could save some 
transistors by ‘hardcoding’ these bits. 
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8 bit of r, |"log^(p)] (6 bit), the 26-bit footprint of p, and a 1-bit flag which 
indicates whether we deal with the rational or algebraic side are written in a 
41-bit input buffer that is to be read by — and is part of — the memory part of 
the node (see below). Hereafter, p is added to the current r- value and checked 
for ‘relevance’ as already described. In summary, we estimate that realizing the 
main part requires « 2,750 transistors. Reading a DRAM entry takes 2 clock 
cycles, and the retrieved values can be processed in a pipeline structure. Thus, 
provided that the respective buffer is not full, basically every 4 clock cycles^ an 
output is produced or the next p-value is selected. For storing the factor base 
elements, about 55, 000 bit of DRAM are needed. 

The memory part of the node provides two 10-bit DRAM entries for each of the 
256 a- values the node has to take care of — one 10-bit counter for the algebraic 
and one for the rational side. These 10-bit words are initialized with zero and used 
to store the sum of the |"log^(p)] -values that ‘hit’ the corresponding a-value 
during sieving on the algebraic resp. rational side. It is convenient to organize 
the DRAM for the 10-bit counters in 20-bit words, so that the algebraic and 
rational counter can be read simultaneously — we will exploit this when checking 
for simulteneous smoothness on the algebraic and the rational side. 

The memory part reads from the mentioned 41-bit input buffer and uses 
the least significant 8 bit of r and the rational/algebraic flag to address the 
correct counter. To add the [log ^(p)] -value read from the input buffer to the 
respective 10-bit value in the DRAM, a 10-bit adder is used. Finally, a different 
part of the DRAM is needed to store footprints of prime factors larger than the 
already mentioned predetermined bound Bf, say Bf = 2^^. As explained above, 
for storing one footprint we need 26 bits. Moreover, we need 8 bits to identify 
the precise sieving location within the node, plus 1 bit to distinguish between 
the rational and the algebraic side. Thus, in total one complete entry occupies 
26-1-8-1-1=35 bit of DRAM. However, instead of equipping each individual node 
with DRAM for storing found prime factors, it seems more efficient to share 
this DRAM among two nodes that are physical (cf. Section 3.1) neighbours. 
Consequently, we add one bit to each entry to identify the processing unit. Of 
course, the question arises how many prime factors will be encountered per 
sieving subinterval. According to our experiments for Bf = 2^^ and 256 targets 
per node, a DRAM size of 325 • 36 bit (shared by two nodes) seems reasonable. 
Further on, the question of choosing the size of the input buffer arises — according 
to our experiments a buffer with a single 41-bit entry should already be sufficient 
to avoid a performance bottleneck. 

Finally, we have to explain how to identify ‘good’ (a, 6)-pairs and how to out- 
put the respective prime factors from the device. For this purpose, the (some- 
what dirty) approach explained in [GS03c, Appendix B] seems reasonable. In 
summary, for implementing the memory part « 1,250 transistors should be 
sufficient (excluding the DRAM). For incrementing one [log.y 2 ( 7 ')l “Counter and 
checking both thresholds we allow 4 clock cycles which should provide enough 

® Largish and hugish primes can nsually be processed in 2 clock cycles, as most of 
them do not ‘hit’ in the cnrrent subinterval of length S. 
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leeway to store the (footprint of the) prime factor p into the DRAM. In addition 
to this, on average « 11, 000 bit of DRAM (with random access) are needed for 
the found prime factors resp. for the |"log^(p)] -counters and the thresholds T^, 
Ta- 
in the mesh part the complete logic necessary for the clockwise transposition 
routing is located. In particular, this includes an 8-bit comparison unit plus 
some circuitry for taking care of the ‘cross border flags’ in each second node, 
which allows for an efficient (one clock cycle — cf. Section 3.2) exchange operation. 
The mesh part contains a register to store a complete packet as transported 
in the mesh. This register has the same width as the output buffer, and if a 
packet with {xt, yt) being identical to the node’s own coordinates is encountered 
(and the input buffer of the memory part is not full), the 26-bit footprint of p, 
(log^(p)], the least significant 8 bits of r, and the factor base flag are copied 
into the input buffer of the memory part. New packets that have to be released 
into the mesh are read from the output buffer (which in turn is filled by the 
main part of the node as explained above) . Implementing the mesh part should 
‘on average’^ require no more than 1, 100 transistors. 



4.3 Output of the Result and Moving to the Next Sieving Interval 

Once a complete subinterval of size S has been sieved, we have to output the 
found (a, 6)-pairs: for doing so, each processing unit that has set the ‘done’ flag 
during the sieving procedure, outputs the footprints of all stored prime factors 
along with the corresponding factor base indices (1 bit each), the coordinates 
(2x8 bit) of the processing unit that found the factor, and the least significant 
8 bit of the corresponding r-value. Note here, that due to the ‘cleaning process’ 
described in [GS03c, Appendix B] the end of the list of factors is marked with 
an ‘all zero’ entry. The output values are received by supporting hardware that 
has to perform the final smoothness testing. Using the available 59-bit bus for 
this purpose, reading out the results should require less than 700 clock cycles. Of 
course, before outputing the results, we have to be sure that the sieving of the 
subinterval is indeed complete. However, there is no need to use a complicated 
logic for this: from a simulation one can determine a reasonable upper bound 
for the number of clock cycles that are needed to complete the sieving of a 
complete subinterval — for S = 2^"^ on a 256 x 256 mesh such a bound can be 
39, 500 clock cycles (this estimation is based on simulations by means of the 
computer algebra system Magma [BCP97]). After that time we simply instruct 
each processing unit to clear its input and output buffer, to complete any missing 
updates of its r-values, and to output its results. In the worst case (say the 
routing circuit encountered an infinite loop), potentially useful (a, 6)-pairs from 
a single subinterval of length S are lost in this way. 

If the next subinterval to be sieved is in the same line, i. e., the 6- value does 
not change, each processing unit simply has to reset its 256 x 2 10-bit counters 

Recall that the comparer is needed only in each second node. 
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to zero now, and is ready to sieve the next subinterval of length S. Analogously 
as in [GS03a], in this case no new data has to he loaded into the device, as all 
r-values have already been updated during the processing of the last sieving 
interval. Thus the change into the next sieving interval can be estimated to take 
no more than 500 clock cycles. Finally, passing to the next 6-value requires a 
replacement of the stored r-values in the processing units, analogously as in 
[GS03a]. Using the 59-bit bus, 2000 pins per chip, and an I/O clocking rate of 
133 MHz, we expect that loading the data into the DRAM takes no more than 
0.02 seconds. 

5 An Improvement 

Before discussing the performance of the above design, one may ask about pos- 
sible improvements of the discussed parameters and the design. E. g., one may 
think of using larger or smaller mesh sizes, of different values for S or of changing 
the number of primes that each processing unit deals with locally. In this paper 
we focus on one (which we think reasonable) parameter choice, but there is still 
leeway for experimenting here — lacking a theoretical analysis of the underlying 
routing algorithm, we cannot make reliable theoretical predictions. 

In [GS03c, Appendix G] a significant improvement is described which affects 
the physical arrangement of the processing units as discussed in Section 3.1. The 
basic idea is to use four interleaving meshes instead of one. Gertainly there are 
also other modifications of the above design, but in the estimations given in the 
next section we restrict to this improvement using four meshes. 



6 Space Requirements and Performance of the Device 

Due to the ‘small’ DRAM banks, we take 0.3/rm^ (0.5/im^) for a DRAM bit 
with sequential (random) access and 2.8/im^ per transistor into account, which 
is somewhat larger than the estimates in [LSTT02, Table 2] for a specialized 
0.13/rm DRAM process. Gombined with our estimations for the sizes of the node 
parts, this yields an estimated total size of a 256 x 256-mesh of « (4.9 cm)^. 
Here the use of four meshes as described in [GS03c, Appendix G] is assumed. 

Having in mind the comparatively regular layout and that 90nm processes 
are becoming more widespread, manufacturing such chips does not seem to be 
unrealistic (recall that for the proposed TWIRL parameters — which are not 
optimized for chip size — already the algebraic sieve has an estimated size of 
«(6.7cm)^ [ST03, Section 4.4]). In contrast to TWIRL, in the above mesh-based 
design a single device handles both factor bases, and we do not require any 
inter-chip communication, which is in particular helpful for cooling the chips. 
But what about the performance of the above device which we assume to be 
clocked with 500 MHz (instead of 1 GHz in TWIRL, where the time-critical 
operations are simpler)? Assuming a sieve line width of 3.4 • 10^^ (see [ST03, 
Table 1]) and that 40,000 clock cycles are needed per sieving subinterval of size 
S = 2^^, a single chip with a mesh of size 256 x 256 can process a sieve line in 
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« 163 seconds. This is almost a factor 20 slower than a TWIRL cluster consist- 
ing of four rational sieves (each of size «(3.6cm)^) and one algebraic sieve (of 
size «(6.7cm)^). However, while only six TWIRL clusters fit on a single 300 mm 
silicon wafer, we can fit 21 mesh-circuits of size 256 x 256 on such a wafer. With 
21 chips we can handle a sieve line in « 7.8 seconds, which is « 6.3 times more 
than a wafer with six TWIRL clusters. 

Moreover, in [ST03] the authors explain how to exclude sieve regions where a 
and b have a common divisor 2 or 3. The common divisor 2 can be handled easily 
by our device — essentially, we have to add 2p to r instead of p. Handling the 
common divisor 3 is in principle possible, but would require significant additional 
logic, and we do not consider such a modification here. Hence, instead of an 
‘essentially free’ 33% time reduction in TWIRL, we assume only an ‘essentially 
free’ 25% time reduction. Thus, for completing all expected 8.9- 10® sieving lines 
(see [ST03, Table 1]) for a 768-bit number, with one wafer we expect that « 600 
days are needed — roughly 6.3 times more than with TWIRL. But as smaller and 
regular chips are simpler to produce, and as our design does not rely on inter- 
chip communication, from a practical point of view the mesh-based approach 
might be an interesting alternative to TWIRL clusters. 

7 Conclusions and Further Work 

The above discussion shows that building a mesh-based sieving device for 768-bit 
numbers could be feasible with current technology. Depending on the number of 
chips one is willing to use, performing the sieving step for such numbers within 
a few months seems feasible. In comparison to the proposed TWIRL clusters 
(which are not optimized for chip size), the chips in our design are smaller, 
no inter-chip communication is involved, and the rather regular layout should 
simplify the production of a detailed hardware layout. A main drawback of the 
mesh-based approach is a slow-down of a factor « 6.3 compared to TWIRL. How- 
ever, the simpler hardware requirements might outweigh this drawback. Also, we 
would like to emphasize that the discussed mesh is certainly not optimal, and 
modifying some of the paramter choices may yield relevant speed-ups. E. g., ex- 
periments show that if one is willing to allow for larger output buffers (which of 
course increases the chip size), the required number of clock cycles per sieving 
subinterval can be reduced. 

Moreover, to further reduce the chip size, one can think of using a smaller 
mesh with only 128 x 128 nodes. According to our experiments, such a design is 
less efficient, but of course the resulting chips are smaller and producing them can 
be expected to be cheaper. On the other hand, one can ask whether implementing 
a larger 512 x 512 mesh is still feasible and whether a significant speed-up is 
possible in this way. We have not enough experimental results to give a reliable 
answer here, but exploring larger meshes is certainly worthwhile, in particular 
in regard to 1024-bit numbers: it is a natural question to ask to what extent 
the above mesh-based approach can deal with 1024-bit numbers. We cannot 
give a satisfying answer here at the moment. However, due to the compact 
representation of the factor bases in our device, it is certainly worthwhile to 
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explore the 1024-bit case in more detail: the DRAM required for storing the 
factor bases is one of the critical issues in TWIRL, and it seems to be a very 
interesting question to explore the potential of the above mesh-based approach 
for 1024-bit numbers. 
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Abstract. We describe a block-cipher mode of operation, EME, that 
turns an n-bit block cipher into a tweakable enciphering scheme that acts 
on strings of mn bits, where m G [1 .. n]. The mode is parallelizable, but 
as serial-efficient as the non-parallelizable mode CMC [6]. EME can be 
used to solve the disk-sector encryption problem. The algorithm entails 
two layers of ECB encryption and a “lightweight mixing” in between. We 
prove EME secure, in the reduction-based sense of modern cryptography. 
We motivate some of the design choices in EME by showing that a few 
simple modifications of this mode are insecure. 



1 Introduction 

Tweakable enciphering schemes. A tweakable enciphering scheme is a func- 
tion E that maps a plaintext P into a ciphertext C = E^(P) under the control 
of a key K and tweak T. The ciphertext must have the same length as the 
plaintext and there must be an inverse to E^. We are interested in schemes 
that are secure in the sense of a tweakable, strong pseudorandom-permutation 
(±j5rp): an oracle that maps (T,P) into E^(P) and maps (T,C) into 
must be indistinguishable (when the key K is random and secret) from an oracle 
that realizes a T-indexed family of random permutations and their inverses. A 
tweakable enciphering scheme that is secure in the ±j5fp-sense makes a desirable 
tool for solving the disk-sector encryption problem: one stores at disk-sector loca- 
tion T the ciphertext C = E](-(P) for plaintext P. The IEEE Security in Storage 
Working Group [9] plans to standardize a ±j5rp-secure enciphering scheme. 

Our contribution. This paper specifies EME, which is a simple and paral- 
lelizable tweakable enciphering scheme. The scheme is built from a block cipher, 
such as AES. By making EME parallelizable we accommodate ultra-high-speed 
mass-storage devices to the maximal extent possible given our security goals. 
When based on a block cipher E\ {0, 1}* x {0, 1}” — >■ {0, 1}” our mode uses a 
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fc-bit key and 2m + 1 block-cipher calls to encipher an mn-bit plaintext in a way 
that depends on an n-bit tweak. We require that m € [1 .. n]. 

The name EME is meant to suggest ECB-Mix-ECB, as enciphering under 
EME involves ECB-encrypting the plaintext, a lightweight mixing step, and 
another ECB-encryption. For a description of EME look ahead to Figs. 1 and 2. 

We prove that EME is secure, assuming that the underling block cipher is 
secure. The proof is in the standard, provable-security tradition: an attack on 
EME (as a ±j5fp with domain Ai = {0, 1}" U {0, 1}^" U • • • U {0, 1}" ) implies an 
attack on the underlying block cipher (as a strong PRP with domain {0, 1}"). 

We go on to motivate some of the choices made in EME by showing that 
other choices would result in insecure schemes. Finally, we suggest an extension 
to EME that operates on sectors that are longer than bits. 

CMC MODE. The EME algorithm is follow-on work to the CMC method of 
Halevi and Rogaway [6]. Both modes are tweakable enciphering schemes built 
from a block cipher E: {0, 1}^ x {0, 1}" — >■ {0, 1}". But CMC is inherently 
sequential, as it is built around CBC, while EME overcomes this limitation, 
which was seen as potentially problematic for high-speed encryption devices. 
The change does not increase the serial complexity: both modes use about 2m 
block-cipher calls (and little additional overhead) to act on an mn-bit string. 

Further history. Naor and Reingold gave an elegant approach for making a 
strong PRP on N bits from a block cipher onn < N bits [16,15]. Their approach 
involves a hashing step, a layer of ECB encryption (say), and another hashing 
step. They do not give a fully-specified mode, but they do show how to carry 
out the hashing step given an xor-universal hash-function that maps N bits 
to n bits [15]. In practice, instantiating this object is problematic: to compare 
well with CMC or EME one should find a hash-function construction that is 
computationally simpler than CBC-AES, both in hardware and software, and 
has a collision bound of about 2“^^®. No such construction is known. 

An early, unpublished version of the CMC paper contained buggy versions 
of the CMC and EME algorithms. Joux discovered the problem [10] and thereby 
played a key role in our arriving at a correct solution. CMC was easily fixed 
in response to Joux’s attack, but EME did not admit a simple fix. Indeed, 
Section 6.1 in this paper effectively proves that no simple fix is possible for 
the earlier EME construction. 

Efforts to construct a block cipher with a large blocksize from one with a 
smaller blocksize go back to Luby and Rackoff [14], whose work can be viewed 
as building a 2n-bit block cipher from an n-bit one. They also put forward the 
notion of a PRP and a strong (“super”) PRP. The first attempt to directly con- 
struct an mn-bit block cipher from an n-bit one is due to Zheng, Matsumoto, 
and Imai [19]. A different approach is to build a wide-blocksize block-cipher 
from scratch, as with BEAR, LION, and Mercy [1,4]. The definition of a tweak- 
able block-cipher is due to Liskov, Rivest, and Wagner [13]. An earlier work by 
Schroeppel suggested the idea of a tweakable block-cipher, by designing a cipher 
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that natively incorporates a tweak [18]. The concrete-security treatment of PRPs 
that we use begins with Bellare, Kilian, and Rogaway [2]. 

Discussion. EME has some advantages over CMC beyond its parallelizability. 
First, it uses a single key for the underlying block cipher, instead of two keys. 
All block-cipher calls are keyed by this one key. Second, enciphering under EME 
uses only the forward direction of the block cipher, while deciphering now uses 
only the reverse direction. This is convenient when using a cipher such as AES, 
where the two directions are substantially different, as a piece of hardware or 
code might need only to encipher or only to decipher. Finally, we prove EME 
secure as a variable-input-length (VIL) cipher and not just as a fixed-input- 
length (FIL) one. This means that EME remains secure even if the adversary 
intermixes plaintexts and ciphertexts of various lengths during its attack. 

We comment that the parallelizability goal is arguably of less utility for a 
±j5fp-secure enciphering scheme than for some other cryptographic goals. This 
is because, parallelizable or not, a ±pip-secure encryption scheme cannot avoid 
having latency that grows with the length of the message being processed (to 
achieve the ±j5ip security notion one cannot output a single bit of ciphertext un- 
til the entire plaintext has been seen). Still, parallelizability is useful even here, 
and the user community wants it [8] . More broadly, EME continues a tradition 
of trying to make modes of operation that achieve parallelizability at near-zero 
added computational cost compared to their intrinsically serial counterparts (ex- 
amples include CTR mode, lAPM [11], and PMAC [3]). 

2 Preliminaries 

Basics. We use the same notions and notations as in [6]. A tweakahle enciphering 
scheme is a function E:/CxTxAl A4 where A4 = IJie/’f®’ message 

space (for some nonempty index set / C N) and /C yf 0 is the key space and T yf 0 
is the tweak space. We require that for every K € JC and T G T we have that 
E(AT, T, •) = E](-(-) is a length-preserving permutation on Ai. The inverse of an 
enciphering scheme E is the enciphering scheme D = E’ ^ where X = D^(P) 
if and only if E](-(A) = Y. A block cipher is the special case of a tweakable 
enciphering scheme where the message space is A1 = {0, 1}" (for some n > 1) 
and the tweak space is T = {e} (the empty string). The number n is called 
the blocksize. By Perm(n) we mean the set of all permutations on {0,1}". By 
Perm^(Al) we mean the set of all functions tt: T x A4 ^ A4 where 7 t(T, •) is a 
length-preserving permutation. 

An adversary A is a (possibly probabilistic) algorithm with access to some 
oracles. Oracles are written as superscripts. By convention, the running time of 
an algorithm includes its description size. The notation A 1 describes the 
event that the adversary A outputs the bit one. 

Security measure. For a tweakable enciphering scheme E: 1C x T x M -G M 
we consider the advantage that the adversary A has in distinguishing E and its 
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inverse from a random tweakable permutation and its inverse: Advg^’^^(A) = 



Pr 



K^K.: ^ ij -Pr [7re^Perm'^(7W) : ’r b-,-) ^ 1 



The notation shows, in the brackets, an experiment to the left of the colon and 
an event to the right of the colon. We are looking at the probability of the 
indicated event after performing the specified experiment. By A df we mean 
to choose X at random from the finite set X . In writing ±prp the tilde serves 
as a reminder that the PRP is tweakable and the ± symbol is a reminder that 
this is the “strong” (chosen plaintext /ciphertext attack) notion of security. For 
a block cipher, we omit the tilde. 

Without loss of generality we assume that an adversary never repeats an 
encipher query, never repeats a decipher query, never queries its deciphering 
oracle with (T, C) if it got C in response to some (T, M) encipher query, and 
never queries its enciphering oracle with (T, M) if it earlier got M in response to 
some (T, C) decipher query. We call such queries pointless because the adversary 
“knows” the answer that it should receive. 

When 7?. is a list of resources and Adv™(A) has been defined, we write 
Adv™(7^) for the maximal value of Adv^’^(A) over all adversaries A that 
use resources at most TZ. Resources of interest are the running time t and the 
number of oracle queries q and the query complexity cr„ (where n > 1 is a num- 
ber). The query complexity cr„ is measured as follows. A string X contributes 
max{|A|/n, 1} to the query complexity; a tuple of strings {Xi, X 2 , ■ ■ .) con- 
tributes the sum of the contributions of each string; and the query complexity 
of an adversary is the sum of the contributions from all oracle queries plus the 
contribution from the adversary’s output. So, for example, an adversary that 
asks oracle queries (Ti,Pi) = (0”,0^”) and then (T 2 ,P 2 ) = (0",e) and then 
outputs a bit b has query complexity 3 -I- 2 -|- 1 = 6. The name of an argument 
(e.g., t or CT„) will be enough to make clear what resource it refers to. 

Finite fields. We interchangeably view an n-bit string as: a string; a nonneg- 
ative integer less than 2” (msb first); a formal polynomial over GF(2) (with the 
coefficient of first and the free term last); and an abstract point in the finite 
field GF(2"). To do addition on field points, one xors their string representa- 
tions. To do multiplication on field points, one must fix a degree-n irreducible 
polynomial. We choose to use the lexicographically first primitive polynomial of 
minimum weight. For n = 128 this is the polynomial -|-x^ -|-x^ -|-x-|- 1. See [5] 
for a list of the indicated polynomials. We note that with this choice of field- 
point representations, the point x = 0”“^10 = 2 will always have order 2" — 1 
in the multiplicative group of GF(2"), meaning that 2, 2^, 2®, . . . , 2^ are all 
distinct. Finally, we note that given L = L„_i • • • LiLq G {0, 1}” it is easy to 
compute 2L. We illustrate the procedure for n = 128, in which case 2L = Lcl 
if firstbit(L) = 0, and 2L = (L^l) © Gonst87 if firstbit(T) = 1. Here Gonst87 = 
Oi20io4i3 firstbit(L) means L„_i and Lcl means L„_2in-3 • • • LiLqO. 
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Algorithm Ex(Pi ■■■Pm) 

100 L^2Ek{0") 

101 for i G [1 .. m] do 

102 PPi -s- 2*“^ L © Pi 

103 PPPi -l- EK{PPi) 

no SP ■h- PPP 2 © • • • © PPPm 

111 MP PPPi © 5P © P 

112 MC^Ek{MP) 

113 M MP © MC 

114 for i G [2 .. m] do 

CCCi G- PPPi ®2*-^M 

115 SC G- CCC 2 © • • • © CCCm 

116 CCCi ^ MC ® sc ®T 

120 for i G [1 .. m] do 

121 CCi^^ EK(CCCi) 

122 Ci G- CCi © 2 *-i L 

130 return Ci • • • Cm 



Algorithm 01^(171 • • • Cm) 

200 L^ 2 Px( 0 ’") 

201 for i G [1 .. m] do 

202 CCi ^ 2 ‘-i L © Ci 

203 CCCi^ E~\CCi) 

210 SC G- CCC2 © • • • © CCCm 

211 MC ^ CCCi® SC ®T 

212 MP Ej}{MC) 

213 M G- MC © MP 

214 for i G [2 .. m] do 

PPPiG- CCCi© 2 *-iM 

215 SP G- PPP2 © • • • © PPPm 

216 PPPi ^ MP © SP © T 

220 for i G [1 .. m] do 

221 PPi^E~\PPPi) 

222 Pi G- PPi © 2 *“^ L 

230 return Pi • • • Pm 



Fig. 1. Enciphering (left) and deciphering (right) under E = EME[P], where E: 1C x 
{0, 1}" -G {0, 1}" is a block cipher. The tweak is T G {0, 1}" and the plaintext is 
P = Pi • • • Pm and the ciphertext is C = Ci • • • Cm- 



3 Specification of EME 

We construct from block cipher E: ICx {0, 1}” — >■ {0, 1}" a tweakable enciphering 
scheme that we denote by EME[P] or EME-E. The constructed enciphering 
scheme has key space /C, the key space for E, and the tweak space is T = {0, 1}”. 
The message space M = {0, 1}” U {0, 1}^” U • • • U {0, 1}” contains any string 
having any number m of n-bit blocks, where m G [1 .. n]. The definition of EME 
is given in Fig. 1 and an illustration of EME is given in Fig. 2. In the figures, 
all capitalized variables except for K are n-bit strings (key K is an element of 
the key-space /C). Variable names P and C are meant to suggest plaintext and 
ciphertext. When we write E^(Pi • • • P^) we mean that the incoming plaintext 
P = Pi • • • Pm is silently partitioned into n-bit strings Pi , . . . , Pm and when 
we write D^((7i • • • Cm) we mean that the incoming ciphertext C = C\ - ■ ■ Cm 
is partitioned into n-bit strings C\, . ■ ■ ,Cm- It is an error to provide E with 
a plaintext that is not mn bits for some m G [1 .. n], or to supply D with a 
ciphertext that is not mn bits for some m G [1 .. n]. 



4 Security of EME 

The following theorem relates the advantage an adversary can get in attacking 
EME[P] to the advantage an adversary can get in attacking the block cipher E. 
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Fig. 2. Enciphering a four-block message P1P2P3P4 under EME. The boxes repre- 
sent Ek and L = 2Fx(0"). We set SP = PPP2 © PPP3 © PPP4 and M = MP ® MC 
and SC = CCC2 © CCCi © CCC4- 

Theorem 1. [EME security] Fix n,t,an G N and a block cipher E: 1C x 
{0,1}” ^ {0,1}”. Then 

■^^''^EME[Perm(n)](^") — 

a„) < ^ + 2 Adv±P^P(t', a„) (2) 

where t' = t + 0{n<Tn)- n 

We note that the theorem does not restrict messages to one particular length: 
proven security is for a variable-input-length (VIL) cipher, not just fixed-input- 
length (FIL) one. The heart of Theorem 1 is Equation (1), whose proof is given 
in the full version of this paper [7]. Equation (2) embodies the standard way to 
pass from the information-theoretic setting to the complexity-theoretic one. 
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5 Proof Ideas 

Since the proof of Theorem 1 is quite long, we give a brief sketch here of some of 
its ideas. We consider an attack against EME as a game between the attacker and 
the mode itself, where the cipher is replaced by a truly random permutation tt 
and this permutation is chosen “on the fly” during this game. We give names to 
all of the internal blocks that occur in the game, where an internal block is any 
of the n-bit values PPi, PPPi, MP, MC, CCCi, CCi that arise as the game is 
played. For example, PPPl is the PPPi-block of the query of the attacker. 

As usual with such modes, the core of the proof is to show that “accidental 
collisions” are unlikely. An accidental collision is an equality between two inter- 
nal blocks that is not obviously guaranteed due to the structure of the mode. 
Specifically, an equality between the blocks in two different encipher queries 
P/ = P/ implies that we also have the equalities PP^ = PP ■ and PPP® = PPP ■ 
and so these do not count as collisions. (And likewise for decipher queries.) Most 
other collisions are considered accidental collisions and we show that those rarely 
happen.^ Showing that accidental collisions are rare is ultimately done by case 
analysis (but, as usual, it takes a non-trivial argument to get there). For exam- 
ple, in one case we show that with high probability PP* ^ another case 

we show that with high probability PPPl ^ MC^ , etc. 

The analysis of most of the cases is standard. Below we illustrate one of the 
more interesting cases. We show that for an encipher query P®, the block MP* 
does not collide with any of the previous MP” blocks. This is easily seen if any of 
the plaintext blocks P* is a “new block” (i.e., different than P” for all r < s). But 
we need to show it also for the case where the plaintext P® was obtained by “mix- 
and-matching” blocks from previous plaintext vectors. So let r < s be the index 
of the last plaintext that shares some blocks with P®, that is, P” = P® for some 
index i. This means that all the blocks P® appeared in queries no later than r. 
If queries s and r sport the same plaintext vectors, P” = P®, and differ only in 
the tweak values, T” yf P®, then we clearly have MP” © MP® = T” © T® 0. 
So assume that P” y^ P®, let I be the set of indexes where they are equal, and 
denote R= [1 .. m”] — / and S = [1 .. m®] — I, where m® and m” are the lengths 
(in blocks) of queries r and s. That is, P” = P® exactly for all i G I, which 
means that all the blocks P® for i G S appeared in queries before query r. This, 
in turn, implies that the value of PPPl for any i G S depends only on things 
that were determined before query r. 

Assume that query r was decipher (and that MC'~ did not already acciden- 
tally collide with anything), so MP'" was chosen “almost at random” during the 
processing of query r. We show that the sum MP® © MP'" can be expressed as 
aMP'" + j3, where a y^ 0 is a constant and (3 is some expression that only de- 
pends on things that were determined before the choice of MP"' . Thus, the sum 

" Actually, we only care about collisions between two values in the domain of tt or 
between two values in its range; collisions between a domain value and a range 
value, such as PPI = CC'l, are inconsequential and we ignore those. 
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MP® © MP'^ is rarely zero. We can write this sum as 

MP‘^ © MP'' = T® © ^ PPP® © P’' © ^ PPPl 
= T'- © T® © ^ PPP® © PPP'i 

i&S i€R 

= things-that-were-determined-before-query-r © PPPl 

i&R 

Assuming that R is non-empty, it is sufficient to show that we can express 
Siefl PPPl ~ aMP'' + (i where a is non-zero and f3 only depends on things that 
were determined before the choice of MP^ . There are two cases in this proof, 
depending on whether 1 G P or not, but they both boil down to the same point: 
since we use the value 2®“^(MC'® © MP®) to mask the CCCi block, the sum of 
PPPl’s can be written as 

PPPl = some-expression-in-the-C'C'C'^’s-and-MC"’ © | 2*“^ j MP'' 

ieR \ieD' ) 

where D' is also a non-empty set, D' Q [1 .. m''], and so the coefficient of MP'' 
in this expression is non-zero. The case where query r is encipher is a bit longer, 
but it uses similar observations. 

One last “trick” that is worth mentioning is the way we handle an adap- 
tive adversary. To bound the probability of accidental collisions we analyze this 
probability in the presence of an augmented adversary, that can specify both the 
queries and their answers. That is, we let the adversary specify the entire tran- 
script (with some minor restrictions) then choose some “permutation” tt that 
maps the given queries to the given answers, and then consider the probability 
of accidental collisions. Clearly, this augmented adversary is no longer adaptive, 
hence the analysis becomes more tractable. 

6 Some Insecure Modifications 

In this section we justify two of our design choices by showing that changing 
them would result in insecure schemes. Specifically, we show that the block- 
cipher call that sits in between the two ECB layers is effectively unavoidable, 
and we show that that the length restriction m < n also is needed. 

6.1 The Middle-Layer Block-Cipher Call Is Needed 

The EME construction has three block-cipher invocations in its “critical path” 
(that is, the construction is depth-3 in block-cipher gates). We now show that, 
in some sense, this is the best that you can do for a constructions of this type. 
Specifically, we show that for a construction of the type ECB-Mix-ECB, im- 
plementing the intermediate mixing layer by any linear transformation always 
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results in an insecure scheme. This remains true even for an rtn tweakable scheme, 
even when one considers only fixed-input-length inputs, even when each block- 
cipher call in each ECB encryption layer uses an independent key, and even if 
the linear transformation in the middle is key-dependent. This result implies 
that, as opposed to the Hash-Encrypt-Hash approach that was proven secure by 
Naor and Reingold [16], the “dual” approach of Encrypt-Hash-Encrypt will not 
be secure under typical assumptions.^ 

Formally, fix m,n G N with m > 2, and let E: )C x {0,1}" — >■ {0,1}" 
be a block cipher. The scheme E = BrokenEME is defined on message space 
{0, 1}™" and key space x 1C where 1C is a set of invertible linear transforma- 
tions on {0, 1}'"". BrokenEME is keyed with 2m independent keys RTi, . . . , Km, 
K[, . . . , K'^ G 1C, and with an invertible (possibly secret) linear transformation^ 
r: {0, 1}™” — {0, 1}"*". To encipher a plaintext P = Pi ■ ■ ■ P^ G {0, 1}"*” we 
do the following: 

— Set PPPi = E]^^{Pi) for i = 1 . . .m. Let PPP = PPPi ■ ■ ■ PPPm be the 
concatenation of the PPPi blocks {PPP G {0, 1}*""). 

— Apply the linear transformation r to obtain CCC = CCC\ ■ ■ ■ CCCm = 
t{PPP). 

— Set Ci = Ey {CCC i) for i = 1 . . .m. The ciphertext is the concatenation of 
all the Ci blocks, C = C\ - ■ ■ Cm G {0, 1}'"". 

Deciphering is done in the obvious way. 

We now give an adversary A that attacks the mode, distinguishing it from 
a truly random permutation and its inverse using only four queries. Denote 
the adversary with its oracles as . The adversary A picks two mn-bit 
plaintexts that differ only in their first block, namely P^ = PiP 2 ---Pm and 
P^ = P[P 2 ■ ■ ■ Pm (with Pi yf P[). Then A queries its oracle as follows: 

(1) Let C^ = Cl---C^^ S{P^) and let C^ = Cf---C^^ S{P^). 

(2) Create two “complementing mixes” of the two ciphertexts, for example 

= C^C^ ■■■Cl, and = ClC"^ ■■■Cl,. 

(3) Let P^ = Pi ■■■ Pi, ^ V{C^) and let = p4 . . • p4 ^ V{C*). 

If the plaintext vectors P^ and P* agree in all but their first block then A 
outputs 1 (“real”) while otherwise it outputs 0 (“random”). To see that this 
works, we denote the intermediate variables in the four queries by PPP* and 
CCCj {i G [1 .. 4] and j G [1 .. mj) and denote the “vector of differences” be- 
tween PPP 4 and PPP^ by DP = DPi^ ■ ■ DPm = PPP^©PPP^. Since P^ 
and P^ agree everywhere except in their first block, it follows that also the 
“vector of differences” DP is zero everywhere except in the first block. Simi- 
larly, we denote the “vector of differences” between CCC^ and CCC"^ by DC = 

^ This may seem somewhat surprising, as one may think that Encrypt-Hash-Encrypt 
should be at least as secure since it uses “more cryptography”. 

^ In fact, it is easy to see that the attack described below works also when r is an 
affine transformation. 
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DC I ■ ■ ■ DCm = CCC^ © CCC^ and since we computed CCC^ = t{PPP^) 
and T is a linear transformation, it follows that DC = t(PPP^) © t(PPP^) = 
t{PPP^ © PPP^) = t{DP). Recall now that for any j G [1 .. m] we have ei- 
ther Cj = Cj and C'- = C’j, or C| = and C'- = Cj. It follows that for all 
j, CCC^ © CCCf = CCCj © CCC^ = DCj, namely CCC^ © CCC^ = DC. 
Putting this together we now compute PPP^ © PPP^ as: 

PPP^ © PPP^ = T~^{CCC^) © T-i(CC'C"‘) 

= T~\CCC^ ® CCC^) = t~\DC) = t~\t{DP)) = DP 

This means that PPPj = PPP^ for j G [2 .. m] and therefore also P^ = Pj for 
all but the first block. 



6.2 The Length Restriction Is Needed 

Recall that EME is defined on message space M = UmG[i 1}™". Here we 

show that the restriction m < n is justified. In fact, we do not know whether 
allowing m = n + 1 breaks the security of EME, but we can show that allowing 
TO = n + 2 permits easy distinguishing attacks. The details of the attack depend 
somewhat on the representation of the field GF(2”). Below we demonstrate 
it for n = 128, where the field GF(2^^®) is represented using the polynomial 
Pi28(x) =xi28 + x^+x2+x+l. 

Assume that to > n + 2 and let J be a nonempty proper subset of the indexes 
from 2 to TO, J C {2, 3, ... , to}, J yf 0, such that in the field GF(2”) we have 
= 0. For example, when GF(2^^®) is represented using P\ 2 s, we have 
2129 _|_ 28 _|_ 2^ -(- 2^ + 2^ = 2(2^^® + 2^ + 2^ + 2^ + 2°) = 0 and so we can set 
J = {130,9,4,3,2}. The attack proceeds as follows: 

(1) Pick an arbitrary tweak T. All the queries in the attack will use the same 
tweak T. (In other words, the attack works also when EME is used as an 
itn tweakable scheme.) Pick two plaintext vectors that differ only in their 
first block, P^ = P 1 P 2 . . . Pm and P"^ = P{P 2 ■ ■ ■ Pm (with Pi ^ P[). 
Encipher both plaintext vectors to get C^ = S{T, P^) and = S{T, P^). 

Greate a ciphertext vector C® such that G| = 



CjifjGJ 



( 2 ) 

(3) 

(4) Decipher C® to get P® = P(T, C®). 

Output 1 (“real”) if P® and P^ agree in all the blocks j G ([2 .. to] — J) and out- 
put 0 (“random”) otherwise. To see that this works we denote the intermediate 
variables in the three queries by PPP* and CCCj and MP^ and MC^ and M* 
(where i G [1 .. 3] and j G [1 .. m]). 

We note that PPP] = PPPf for all j G [2 .. to], and in particular for all 
j G J. Also, from the construction of C® we get that CCCj = CCCj for j G J 
and CCCj = CCCj for j i J. Thus 



MC^ © MC® = T 



y^cccj 



Ti 



yjcccj 
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= ^ (cccj © cccl) = ^(c'cc'lec'c'cj) 

j&j 

= {{PPPf © 2^-^M^) © (PPP/ © 2^-^M^)) 

= ^ (2^-1m 2 + 2^-iM^) = (M2 + m1)^2^-i = 0 

j&J j&J 

So we have MC^ = MC“^ and therefore also MP^ = MP^ and = M^. Thus 
for any j i J,j>l we have PPP| = CCC^j © = CCC] + = 

PPP^ and therefore also Pj = Pj. 
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A Extending EME to Longer Messages 

The restriction on the message size of EME, m < n, means, for example, that 
when using AES as the underlying cipher one cannot encrypt messages longer 
than 2KB. In some applications this restriction could be problematic. We now 
describe EME"*", an extension of EME that can be used to handle message of 
practically any length (as long as it is an integral number of blocks). 

The idea is to divide the m-block input into “chunks” of at most n blocks 
each and in each chunk use a construction similar to EME. Specifically, in the 
first chunk we use exactly the same construction as in EME while in all the 
other chunks we use a similar construction, where we replace the addition of 
SP © T and SC © T (before and after the block-cipher call in between the two 
ECB layers) by additions of the mask Mi from the first chunk. 

We specify in Fig. 3 the forward direction of our construction, E = EME+[E]. 
An illustration of EME^ mode is given in Fig. 4. One observes that EME“'' is a 
“proper extension” of EME in that when we use it on a message of length m < n 
blocks, we get back the original EME mode. 

Although we have not written a proof of security for EME^ we expect that 
such proof can be written. One would follow the arguments in the proof for 
the basic EME, except that one needs to analyze a few more cases in the case 
analysis. 
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Algorithm E5 ^(Pi • • • Pm) 

100 L ■«- (0") 

101 for i G [1 .. m] do 

102 PPi^2^-^ LQ)Pi, PPPi^EKiPPi) 
no MPi ^ PPPi © PPP 2 © • • • © PPPm © T 

111 MCi^ Ek(MPi), Mi^ MPi® MCi 

112 for i G [2 .. n] do CCCi ^ PPPi © 2*"1Mi 

113 for j £ [2 .. [m/n]] do 

114 MP j ■<— PPPy_l'l„^l ® Ml 

115 MCj E k{MPj), Mj<-MPj®MCj 

116 CCC (^j -I'fn-^-i ^ — MC j © Ml 

117 for i G [{j — l)n + 2 .. jn] do 

118 CCCi ^ PPPi © 2’-i "Mj 

119 CCCi ^ MCi © CCC 2 © • • • © CCCm © T 

120 for i G [1 .. m] do 

121 CCi^ EKiCCCi), Ci^ CCi®2^-^ L 

130 return Ci ■ ■ ■ Cm 



Fig. 3. Enciphering under E = EME+[E] with block cipher E: E x {0, 1}" — >■ {0, 1}" 
and tweak T G {0, 1}" and plaintext Pi • • • Pm and ciphertext Ci ■ ■ ■ Cm- 



Pi P2 Pn Pn+l Pn +2 




Cl C2 Cn C^n +1 Cn -\-2 

Fig. 4. Enciphering an (n + 2)-block message under EME+. Here L = 2 Pk( 0") and 
SP = PPP2 © • • • © PPPm and Mi = MPi © MCi and SC = CCC2 © • • • © CCCm- 
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Abstract. In [8] Vaudenay presented an attack on block cipher CBC- 
mode encryption when a particular padding method is used. In this 
paper, we employ a similar approach to analyse the padding methods 
of the ISO CBC-mode encryption standard. We show that, for several 
of the padding methods referred to by this standard, we can exploit an 
oracle returning padding correctness information to efficiently extract 
plaintext bits. In particular, for one padding scheme, we can extract 
all plaintext bits with a near-optimal number of oracle queries. For a 
second scheme, we can efficiently extract plaintext bits from the last 
(or last-but-one) ciphertext block, and obtain plaintext bits from other 
blocks faster than exhaustive search. 

Keywords: padding oracle attack, CBC-mode encryption, ISO standard 



1 Introduction 

1.1 Background 

In [8] Vaudenay presented an attack on block cipher CBC-mode encryption when 
a particular padding method is used. The attack requires an oracle which on 
receipt of a ciphertext, decrypts it and replies to the sender whether the padding 
is valid or not. The attack model assumes the attacker to have intercepted some 
such padded then CBC-mode encrypted ciphertext under some key K, and have 
access to the aforementioned padding validity oracle (operating using the same 
key K). The result is that the attacker can recover the plaintext corresponding 
to any block of ciphertext using an average of 1286 oracle calls, where 6 is the 
number of bytes in a block and a byte is eight bits. 

Further research has been done by Black and Urtubia [1], who generalised 
Vaudenay’s attack to other padding schemes and modes of operations, and pre- 
sented a padding method which prevents the attack. In [2], Canvel et al demon- 
strated the practicality of padding oracle attacks and showed how subtleties in 
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security protocol implementation can lead to flaws. First of all they realised an 
SSL/TLS padding oracle by exploiting timing information that is available upon 
submission of correctly and incorrectly padded ciphertexts. Secondly an attack 
against the IMAP protocol when used over SSL/TLS was implemented. In a 
typical setting, the attack recovers the IMAP password within one hour. KKma 
and Rosa [7] applied idea of a “format correctness oracle” (of which padding is 
a special case) to construct a PKCS#7 validity oracle and were able to decrypt 
one PKCS//:7 formatted ciphertext byte with on average 128 oracle calls. 



1.2 ISO Standards 

The current ISO standard for modes of operation of a block cipher is the second 
edition of ISO/IEC 10116 [4] (the third edition [5] is under development at 
the time of writing). It does not, however, specify any padding methods for the 
modes of operation (including CBC) that require one. In Section 5: Requirements 
it indicates that padding methods are beyond its scope and instead refers to 
ISO/IEC 9797-1 [3] (MACs using a block cipher) and 10118-1 [6] (general hash 
functions) where a few such methods are defined. Using a similar approach to 
[8], we have found attacks of various severity against some of those methods 
when used with CBC-mode encryption. Thess attacks do not, however, entail 
any security implications for those padding methods when they are used within 
their proper contexts (i.e. MACs and hash functions). 

Note that in Annex B.2.3 of ISO/IEC 10116, ciphertext-stealing and another 
method are described for the special treatment of the last two blocks when 
encrypting under CBC-mode, when padding the plaintext is not acceptable. 
The standard does not prescribe that these methods be used, only that they can 
be used instead of padding. We emphasise that we are not attacking these two 
methods, but rather the padding methods in ISO/IEC 9797-1 and 10118-1 that 
are recommended for use in ISO/IEC 10116. 

1.3 Our Contribution 

We assume that an attacker has access to a padding oracle operating under the 
fixed key K and has intercepted a ciphertext encrypted in CBC-mode under 
that key. The attacker’s aim is to recover the plaintext for that ciphertext. We 
further assume that the attacker is able to choose the initialisation vector when 
submitting ciphertexts to the oracle. This assumption prevents our attack from 
working when secret IVs are used; this is permitted in [4]. Some or all of these 
assumptions may be unwarranted when one is attacking a real system. 

Under the above assumptions, our main results are as follows: 

1. Attacking against padding method 3 of [3], the attacker can recover the 
plaintext for every ciphertext block with n -I- 0(log2 n) oracle calls for each 
block, where n is the block size. 

2. There are two attacks against padding method 3 of [6], though they are to 
some degree interdependent. The padding method requires a parameter r to 
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be chosen where 1 < r < n. In the first of our two attacks, the attacker can 
recover all plaintext bits to all ciphertext blocks with a complexity of 
oracle calls per block when r < n. When r = n the complexity increases to 
0(2”). In our second attack, depending on which of two possible states the 
padding is in, the attacker either recovers the whole of the last plaintext 
block with n + 0 (log 2 n) oracle calls, or recovers some u bits of the last-but- 
one plaintext block which then speeds up the first attack by a factor of 2““^ 
in recovering the remaining n — u bits of the block. 

We will first introduce some notation used throughout the paper, followed by 
a review of CBC-mode encryption. Then we present in turn each padding method 
in [3] and [6] and, if applicable, our attack against it. We conclude with a few 
remarks about the need for careful cryptographic design to prevent side-channel 
attacks. 

2 Symbols, Notation, and CBC-Mode Review 

2.1 Symbols and Notation 

Each symbol and notation will be introduced on their first use, but we find it 
convenient to gather them here for reference purposes. 

C : ciphertext output after CBC-mode encryption and ciphertext the attacker 
is trying to decrypt 

C : ciphertext to be submitted to an oracle during an attack 
d,K{Y) : decryption of ciphertext block Y under key K 
6k (X) : encryption of plaintext block X under key K 
D : unpadded data string to be CBC-mode encrypted 

Ij : the intermediate block during CBC-mode encryption, i.e. Dj or 

in the case j = 1, it is Di © IV 
Ij : the intermediate block during the attack, i.e. dfc(C') 

IV : the initialisation vector used in CBC-mode 
Ld ■ the length (in bits) of the data string D 
n : the block size (in bits) of the block cipher 
P : the result of applying a given padding method to D 

P' : data string computed by the padding oracle in the course of verifying 
padding 

q : the number of blocks in data string P after padding 
VALID and INVALID: oracle responses to, respectively, correct and incorrect 
padding after receipt and decryption of some ciphertext 
X\\Y : the result of concatenation of strings X and Y 
X (BY : the result of exclusive-or (XOR) of strings X and Y 
X 2 : the binary representation of the value X 
Xj : the block of the plaintext or ciphertext X 
Xj^k ■ the bit of the plaintext or ciphertext block Xj 
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Fig. 1. CBC-mode encryption 



2.2 Review of CBC-Mode Encryption 

Cipher Block Chaining (CBC) is a mode of operation for an n-bit block cipher for 
encrypting data of arbitrary length. It has been standardised in second edition 
of ISO/IEC 10116 [4] and it is, quite naturally, included in the lastest draft of 
the third edition of that standard [5]. 

Let the encryption operation of the block cipher under key K be ex, and 
the data we wish to encrypt be D. CBC-mode encryption (Figure 1) operates 
as follows: 

1 . A padding method is applied to D to make a padded message P of bitlength 
a multiple of n. 

2. P is divided into n-bit blocks Pi, P 2 . . . P^. 

3. An n bit number is chosen, at random or in a specified way, as the initiali- 
sation vector IV. 

4. Compute ciphertext block C\ = exilV © Pi) and then 

5. Ci = exiCi-i © Pi), for 2 < z < g 

6. The resulting C = 11/^11(7111(7211 . . . \\Cq is the CBC-encrypted ciphertext. 

We assume that IV is always prepended to the ciphertext. This allows a 
more concise notation for our attacks to follow and means that IV effectively 
plays the role of the “zeroth” ciphertext block; we write (7 q = IV. 

Let dx denote the inverse operation to ex. To decrypt a block (7j (Figure 
2) we simply have to compute Di = dk{Ci) © (7i_i for 2 < z < g, and Di = 
dk{Ci)®IV. 

Some security properties of CBC-mode are outlined in Section 2 of [8]. 

3 Attacking the Padding Methods of ISO/IEC 9797-1 

3.1 The Standard 

The standard [3] specifies six algorithms to compute an rrz-bit MAC using an 
n-bit block cipher with a secret key. The algorithms themselves are essentially 
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Fig. 2. CBC-mode decryption 



instances of the CBC-MAC method or variants of it. Padding is applied, as with 
CBC-mode encryption, when the plaintext is not of length (in bits) a multiple 
of n, the block cipher size. For some methods it is always applied, regardless of 
the plaintext length. 

3.2 Padding Method 1 

The method is described as follows: 

“The data string D to be input to the [ . • . ] algorithm shall be right- 
padded with as few (possibly none) ‘0’ bits as necessary to obtain a data 
string whose length (in bits) is a positive integer multiple of n.” 

Notice that this method is many-to-one: different data strings may be padded 
to yield the same result, which means that padding cannot be removed unam- 
biguously if the length of the plaintext is not known. Consequently, given a 
padded data string, one cannot even tell where the data/padding boundary is, 
let alone check for padding validity. In fact, without data length information, 
every plaintext P is a validly padded version of at least one data string D. This 
of course limits the applicability of the padding technique to cases where the 
plaintext is of a fixed length, or where the proper length is somehow otherwise 
conveyed to the recipient. 

No attack can be based on information returned from a padding oracle be- 
cause any ciphetext submitted to such an oracle will decrypt to give a correctly 
padded plaintext. 



3.3 Padding Method 2 

The method: 

“The data string D to be input to the [ . • . ] algorithm shall be right- 
padded with a single ‘1’ bit. The resulting string shall then be right- 
padded with as few (possibly none) ‘0’ bits as necessary to obtain a data 
string whose length (in bits) is a positive integer multiple of n.” 













310 



K.G. Paterson and A. Yau 



This method has been analysed in [1] (it is called OZ-PAD in that paper). 
The key result of [1] is that the method appears to resist padding oracle attacks. 
This is because practically all data strings are correctly padded, with the only 
exception being when a block contains all ‘0’ bits. However this padding mecha- 
nism still lacks what is known as “semantic security” — an INVALID reply from 
the padding oracle would tell the attacker that the decrypted plaintext block is 
not a particular bit string. See [1] for details. 

3.4 Padding Method 3 

The method (Figure 3): 

“The data string D to be input to the [ . • . ] algorithm shall be right- 
padded with as few (possibly none) ‘0’ bits as necessary to obtain a 
data string whose length (in bits) is a positive integer multiple of n. The 
resulting string shall then be left-padded with a block L. The block L 
consists of the binary representation of the length (in bits) Td of the 
unpadded data string D, left-padded with as few (possibly none) ‘0’ bits 
as necessary to obtain an n-bit block. The right-most bit of the block 
L corresponds to the least significant bit of the binary representation of 
Ld’’ 



(L ) 

'2 



DATA 



00,,,0 



[0, n-l] 



Fig. 3. ISO/IEC 9797-1 padding method 3 



We have an attack against this padding scheme that decrypts, a block at 
a time, arbitrary ciphertexts (7111(7211 ... ||Cq. This attack takes n -I- (7(log2 n) 
oracle calls per block. There are two phases to this attack: determining Lu and 
the actual decryption. 



Phase 1: Determining Lu. We want to find Ld, the content of the first block, 
which indicates the length of the unpadded data. To do that we use the padding 
oracle to determine the number of ‘0’ bits that have been appended to the last 
block, if any. This is performed as follows (Figure 4). 

Firstly notice that in CBC-mode decryption, flipping (complementing) any 
single bit at position i in block Cj would flip also the decrypted plaintext bit 
at position i in block Pj+i (whilst corrupting the whole of plaintext block Pi). 
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Flipping the j^'bit in 
this ciphertext block... 
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... will also flip the ^bit 
in this (last) plaintext 
block, which may or may 
not be a padding '0' bit. 






Fig. 4. Attack phase 1 obtaining Lo 



This allows us to flip arbitrary bits within a block in the decrypted plaintext by 
appropriately altering the ciphertext. This observation is in fact the basis of all 
of our attacks. 

A padded data string consists of q > 2 blocks. Here we consider the case 
q > 3; the case q = 2 is handled separately below. The string is right-padded 
with some ‘0’ bits and left-padded with the length block containing the binary 
representation of L^. is effectively a pointer to the last bit of the unpadded 
data, all the bits after which should be ‘0.’ Let’s now see what happens if we 
flip a single bit in Pg, the last plaintext block of the data string (by flipping a 
bit in Cq-i, the last-but-one ciphertext block). This change does not affect the 
decryption of Ci (since g > 3) so the length block is left intact. So one of two 
things might happen: 

1. The bit flipped is part of the original unpadded data. The padding is there- 
fore still intact and correct and the oracle returns VALID. 

2. The bit flipped is one of those ‘0’ bits padded. The oracle therefore detects 
a ‘1’ bit where it should have been ‘0,’ and thus returns INVALID. 

This means that after flipping a single bit in Dg, a VALID oracle response 
implies the padding boundary is to the right of the current position, and to the 
left otherwise. So we now can work out the exact location of the boundary by 
flipping the last plaintext block one bit at a time, say from right to left. The 
transition point of oracle response from INVALID to VALID tells us the location 
of the boundary we are after. This can be made more efficient by using a binary 
search similar to that presented in Section 3 of [1]. Once we have the boundary 
it is trivial to compute the value Lo from the number of blocks in the ciphertext 
and the position of the boundary within the last block. 
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This phase is presented in Algorithm 9797-l-m3-get-L£)-general below. The 
notation Xa^b denotes the bit at position b of the ciphertext or plaintext block 
Xa- We number the positions in a block from 0 to n — 1, going from left to right. 

This method of obtaining L d does not work when the unpadded data string 
consists only of a single block (this includes the case of the data being the 
null string). Here, the padded data string consists of two blocks Pi||P2- Now 
flipping bits in the last (second) plaintext block would require changes in the 
first ciphertext block Ci, which in turn would corrupt the first plaintext block 
where Lr> is supposed to reside. 

Fortunately, there is a way to circumvent this problem at least for block sizes 
n = 2™, TO > 1, the most common situation in practice. Let C = /F||Ci 11(72 be 
the ciphertext for which we wish to determine Ld. It is not hard to see that if 
IV = IV © 0 . . ._0 then C = /W||(7i||(7i||(72 is also a valid ciphertext 

n—m—1 m 

unless Ld = 0 or Ljj = n (in which cases the padding oracle will return INVALID 
on submission of C') . In the situation where C is valid then we can simply apply 
the method described above to C to obtain + 2™, and hence Lo- 

We need to apply a further trick to distinguish the remaining cases, i.e. when 
Ld = 0 or Ld = n. Now we set IV" = IV © and submit C" = 

n—m—2 m 

/y"||(7i||C'i||(72 to the padding oracle. If Ld = 0, then C" will, on decryption, 
contain a length held L'^ with L'^ = 3n. Since the unpadded data in C" is of 
length at most 2n, the padding oracle will output INVALID. On the other hand, 
if Lu = n, then C” will yield L'[, = 2n and C” accepted as VALID. Hence 
one futher oracle query on a carefully chosen C" is sufficient to decide whether 
Lr) = 0 or Ljy = n. 

The special case q = 2 is presented in Algorithm 9797-l-m3-get-L£)-special 
below. 



Phase 2: Decrypting. We now have Ld, the binary encoding of which is the 
content of the first plaintext block. We can deduce that /i, the first intermediate 
block, is equal to Ljj © IV. Note that by manipulating IV, we can change the 
content of the first block to indicate a data length of any desired value. If L'jj is 
the desired value, we can take IV = (L^)2 © = (L^)2 ® {Ld )2 ® IV. 

We are now ready to decrypt an arbitrary ciphertext block Ck from the 
ciphertext /y||(7i||(72|| . . .\\Cq, where 2 < k < q (Figure 5). Note that there is 
no need to decrypt C\ as it just encrypts the value L^. The decryption is done in 
a bit-by-bit fashion, starting from the rightmost bit. So to begin with we submit 
to the oracle the ciphertext C = /y'||(7i||i?||C'fc where 

IV = {2n- 1 ) 2 ® {Ld) 2 ® IV, 



and i? is a random n-bit block. 

After decryption, L'j^, the length field in the resulting plaintext 11^2 11-^3 
points to the last-but-one bit of P3, the last block. Now the padding oracle 
outputs VALID for C if the last bit of P3 is equal to ‘0,’ and INVALID if P3 
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Fig. 5. Attack phase 2 — decrypting 



Algorithm 

9797-l-m3-get-Li3-general 

Input: 7 F||Ci||C2||...||C, 

Output: Ld 

Ensure: g > 3 

C':=7F||Ci||C2||...||C, 

I ■- 0 
u := n — 1 

repeat 

h := [(/ + ri) /2] 

Cq-\^h := Cq-l^h © 1 
if oracle((7) = VALID then 
1 := h 

else if oracle((7) = INVALID then 
u := h-1 
end if 

Cq-l^h := Cq-l^h © 1 

until I = u 

return Ld := (g — l)n + 1 + 1 



Algorithm 

9797-l-m3-get-7/u-special 

Input: 71/11(7111(72 
Output: Ld 

Ensure: n = 2™, m > 1, g = 2 
IV :=IV® 0^^ 10... _0 

n — m — l m 

C := 7F'||(7i||(7i||(72 
if oracle((7') = VALID then 
L'd = 9797-l-m3-get-L_D-general((7') 
return Ld ■— L'd — 2™^ 
else 

IV 7F© 0..._0 110..._0 

n — m — 2 m 

C" :=7V"||C7i||C7i||C72 
if oracle((7” = VALID) then 
return Ld ■= n 
else 

return Ld ~ 0 
end if 
end if 



is equal to ‘1.’ We then have I's^n-i = and this block is equal to 

the original intermediate block Ik- So we can obtain Pk^n-i = 

To decrypt the next bit, we construct a new ciphertext for the oracle. We 
want, after decryption, the value in L'jj to decrement by one, and to ensure 
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that P^^n-i is ‘O’. We can achieve the former by altering IV appropriately 
and the latter by keeping/flipping last bit of C '2 if the previous response was 
VALID/INVALID. Submitting the resulting ciphertext to the oracle, a VALID 
response indicates P^^n -2 equals ‘0,’ and ‘1’ otherwise. We can then compute 
^ 3 ,n -2 = ^3,n-2 ® Pn- 2 - Note that the random block R at this iteration which 
may (or may not) have changed from the last iteration. We can now obtain 
Pk,n—2 — -^3,n— 2 ® Ck-l^n—2- 

The process is repeated, decrementing L'jy by one per iteration while making 
sure the bit positions in P3 corresponding to those we have obtained stay at ‘0.’ 
One bit of and one bit of Pk are obtained at each iteration and we stop after 
n — 1 iterations when the n — 1 rightmost bits of those blocks are determined. 
We cannot get the leftmost bit of the block Jg (hence Pk,o) using this approach 
because at the next step would indicate a length 2n, a multiple of the block 
size, and according to the standard, we would never append a new block in such 
cases. 

Instead, we extract this leftmost bit Pk^ by using a different approach. We 
assume that standard binary encoding is used for length information, with least 
significant bit in the rightmost position. (A similar attack can be mounted in 
the opposite situation too, but we omit the details.) Consider the ciphertext 
C' = IV'\\C[\\R where IV' = Ck-i © 0Pfc,iPfe,2 • ■ ■ Pk,n © (n)2, = Ck and 

P is a random n-bit block. This ciphertext is constructed in such a way that 
the length field is equal to Pk,o 0 ... 0 © {11)2, indicating a length of either n or 
n + 2"“^ depending on the value of Pkfi- So if C is submitted to the oracle, then 
an output of VALID(INVALID) tells us that Pk^ = 0 {Pk,o = respectively.) 

We summarise the decryption phase as the pair of algorithms 9797-l-m3- 
decrypt and 9797-l-m3-decrypt-last-bit below. In these algorithms, 17 is the 
function which takes as input a ciphertext C and is defined as: 

Jo if the padding oracle returns VALID for input C, 

w ^ y V_2 J \ 

I 1 if the padding oracle returns INVALID for input C. 



Complexity. Phase 1, in the general case {q > 3), should take no more than 
log2 n oracle calls using binary search. To decrypt many messages encrypted 
under a fixed key K, this phase only needs to be performed once. Phase 2 
takes one oracle call per plaintext bit, thus n calls per plaintext block. For the 
special case q = 2, one further oracle call is required in situations where L]j = 0 
or Ljy = n to distinguish between them (no further oracle calls are needed 
otherwise). 

Fewer than log2 (n) + 1 + (g — l)n oracle calls are needed to recover all the 
bits of plaintext from a q block ciphertext (remember that the first block contains 
Lo which is not part of the unpadded data string, and its value is already known 
after phase 1 anyway). 

Optimality. The oracle returns one bit of information per use, so {q — l)n is 
information theoretically the smallest number of oracle calls needed to recover 
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Algorithm 

9797-l-m3-decrypt 

Input: Ld, IV, Ci, Ck 

Output: Pk,iPk,2 ■ ■ ■ Pk,n-1, the 

rightmost n — 1 bits of Pk 



Algorithm 

9797-l-m3-decrypt-last-bit 

Input: Ck — l, Ck, Pk,lPk,2 • • • Pk,n 
Output: Pk,o, the leftmost bit of Pk 

R := a. random n-bit block 

IV' := Ck-l © 0Pk,lPk,2 ■ ■ ■ Pk,n © {n)2 

C := IV'\\C^\\R 

Pk,o ~ n(c') 

return Pkfi 



R := a random n-bit block 
for j := n — 1 to 1 do 

IV' ~IV®LD®in+j)2 

b ■- n{c') 

C' := /P'llCilli^llCfc 
Pk,j ■= b (B Rj © Ck—i,j 
R ■- R®0^_^b(P^ 

3 n—j — l 

end for 

return Pk,\Pk ,2 ■ ■ ■ Pk,n-i 



(q — l)n bits of plaintext entropy. Hence our attack makes nearly optimal use 
of the padding oracle, especially when many ciphertexts are decrypted for the 
same key K. 



4 Attacking the Padding Methods of ISO/IEC 10118-1 

4.1 The Standard 

ISO/IEC I0II8 is a standard for hash functions, Part 1 [6] of which describes 
the general construction of a hash function. 

Padding methods 1 and 2 in this standard are identical to the respective 
methods in ISO/IEC 9797-1 which were already discussed in the previous section. 
We focus instead on padding method 3 of [6]. 



4.2 Padding Method 3 

In the standard, Li is used to denote the block size. It will henceforth be replaced 
by our usual notation n to be consistent with the rest of this paper. The method 
is as follows (Figure 6): 

“This padding method requires the selection of a parameter r (where 
r < n), e.g. r = 64, and a method of encoding the bit length of the data 
D, i.e. L[) as a bit string of length r. The choice for r will limit the 
length of Z?, in that Ljj < 2’’. 

“The data D [ . . . ] is padded using the following procedure. 

1. D is concatenated with a single ‘1’ bit. 
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DATA 



10,,,0 






[l,n] 



Fig. 6. ISO/IEC 10118-1 padding method 3 



2. The result of the previous step is concatenated with between zero 
and n — 1 ‘0’ bits, such that the length of the resultant string is 
congruent to n — r modulo n. The result will be a bit string whose 
length will be r bits short of an integer multiple of n bits (in the 
case r = n, the result will be a bit string whose length is an exact 
multiple of n bits). 

3. Append an r-bit encoding of using the selected encoding method, 
yielding the padded version of H.” 

The above description can be summarised as “pad a ‘1’ followed by the 
smallest number of ‘O’s needed to push the r bits of Ljj right to the end of a 
block.” Using this method, the padded bits for data string D are appended in 
one of two ways: 

Same-block {Lo mod n) < {n — r — 1). The last block has enough space after 
the last plaintext bit to contain at least a single ‘1’ bit and the r bits of L, 
the length block that holds Lu. The number of padded bits is between r-|- 1 
and n — 1 . 

Cross-block {Lo mod n) > (n — r). The last block does not have enough space 
to contain a ‘1’ bit and the r bits of L. The number of padded bits is between 
n and n + r and the padding extends over two blocks. Note that this will 
always be the case when r = n. 

We have identified two attacks against this method, though they are to some 
degree dependent on each other. Note that no encoding method (for Lo) is 
specified in the standard. Our attacks work no matter which encoding method 
is used, though the attacker needs to know this method. We expect that base 2 
encoding will be used in most cases and it will be used for illustrative purposes 
henceforth. 



Attack 1: Directed IV search. This attack works against any block Ck 
of the ciphertext /U 110111(7211 . . .\\Cq and recovers the corresponding plaintext 
block using on average 2 ”“^ -I- 2 ^”“"+^ and at most 2 ” -|- 3 • padding 

oracle queries, provided r < n — 1. When r = n, the attack requires on average 
2" and at most 2"+^ oracle queries. 
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We traverse through 
all values of these r 
bits with a high 
probability of 
success. 



'Is correctly 
padded?' 



Fig. 7. Directed IV search 



We first consider the case r < n — 1. We submit to the oracle strings of the 
form 



IV'\\C[ 

where IV is a specially selected initialisation vector and C[ = Ck, hoping that 
the oracle returns VALID. This situation is depicted in (Figure 7). If it does, then 
the plaintext block Pk can be extracted using Attack 2 below on the ciphertext 
IV'\\C[ . The overall complexity will be the sum of the two attacks’ complexities, 
and will be dominated by the complexity of this first phase. 

How then should IV' be selected? Notice that there is a probability of 1—2’’“” 
that there is a ‘1’ somewhere in the leftmost n — r bits of Pk- Thus, if we 
traverse through all 2” possible settings of the rightmost r bits of IV , then with 
probability 1 — 2’’“" we will obtain (at least) one VALID reply from the padding 
oracle. The expected number of oracle queries in this situation is therefore 2’’“^. 
But with probability 2’’“", all replies will be INVALID. Now if we flip the bit in 
position n — r — 1 of IV and repeat the above process, it is easy to see that we 
are guaranteed to obtain at least one VALID response. A simple analysis shows 
that the number of oracle queries needed is equal to 2”“^ + 2^’’“"+^ on average 
and is always at most 2” + 3 • 2^’’“"“^. 

An algorithm for Attack 1 in the case r < n — 1 is given in Algorithm 10118- 
l-m3-al-general above. 

Next we consider the case r = n. Here a valid plaintext must be at least two 
blocks in length and a three-block ciphertext is required to perform 

the attack. Instead of modifying only the initialisation vector as before, we now 
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Algorithm 

10118-1- m3-al-gener al 
Input: Ck,n,r 

Output: IV' s.t. IV'\\Ck is a valid 
ciphertext 

Ensure: 1 < r < n 

IVo ~ a random n-bit block 
IV :=0_^ 

n 

i ■- 0 

repeat 

IV ■- IVo® 0.. V i 2 

n-r-l r+l 

C-.= lV\\Ck 
i ■.= i + I 

until oracle(C) = VALID 
return IV' 



Algorithm 

101 18-l-m3-al-special 
Input: Ck,n,r 

Output: IV , 7? s.t. 7V'||i?||C'fc isavalid 
ciphertext 

Ensure: r = n 

IVo '■= a random n-bit block 
7?o := a random n-bit block 
for i := 0 to 2" — 1 do 
R := Ro © *2 

n 

for j := 0 to 1 do 
IV :=7V'o©0...^ 

n 

C~lV\\R\\Ck 

if oracle(C') = VALID then 
return IV' , R 
end if 
end for 
end for 



also change the random block R at each iteration. The most likely valid two-block 
plaintext to obtain at random is 

xoxi . . . Xn-2^ II {Ld = n - 1)2 

^ ^ ^ ^ 

n n 

where each Xi can be either ‘0’ or ‘1.’ A valid two-block plaintext is guaranteed 
to occur if we traverse through all 2"+^ possible settings of the second plain- 
text block along with rightmost bit of the first plaintext block (by, respectively, 
changing R and the rightmost bit of IV), so on average this strategy has a 
complexity of 2" oracle calls. 

This special case is illustrated in Algorithm 10118-l-m3-al-special above. 

Decrypting. Once we have a valid padding we can employ Attack 2 below with 
input a valid ciphertext of the form /E' 1 1 Cfc (when r < n— l)or (when 

r = n). 

We consider first the case r < n — 1. Here the plaintext corresponding to the 
ciphertext submitted to Attack 2 will always be same-block padded (because it 
only contains one block). Then Attack 2 will efficiently recover the entire last 
plaintext block for this ciphertext, which we denote by P[. P{ will in general 
consist of data bits, padding bits and length information. From it is trivial 
to recover Pk (the plaintext block that we are actually after). We have: 



Pk = P{® IV © Ck-i. 
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For the case r = n, the first phase of Attack 2 below efficiently recovers the 
length of the unpadded data for the valid ciphertext IF'| |i?| |Cfc. This information 
is contained in the length field which occupies all of P 2 , the second plaintext block 
for this ciphertext. Thus after the first phase of Attack 2, P 2 is known. Now Pk 
can be recovered from: 



Pk = P2®R®Ck-i. 



Complexity. Obtaining a valid plaintext block takes on average + 
oracle calls when r < n — 1 and on average 2" oracle calls in the case r = n. 
Our use of Attack 2 below has a complexity of n + 0(log2 n) oracle calls for all 
values of r (recall that for r = n, only the first phase of Attack 2 is needed, 
while for r < n — 1, the plaintext is same-block padded in which case Attack 
2 is efficient). Thus our use of Attack 2 does not contribute significantly to the 
overall complexity to decrypt a single block. 

Impact. This attack applies to any ciphertext block and all n bits within the 
block are recovered. For many choices of r this attack is many orders faster than 
an exhaustive key search, and for a small enough r this attack will be practical 
whenever a padding oracle is available. When r = n, our attack is still better 
than an exhaustive key search for block ciphers whose key size is greater than the 
block length. It is interesting to note that the parameter r seemingly innocent 
of any security implications turns out not to be so at all. 



Attack 2: Attacking the last block(s). This attack is conceptually similar 
to the one against padding method 3 of ISO/IEC 9797-1 given above: there are 
two phases, the first of which determines Ljy and the second of which recovers 
any plaintext that is found in a “mixed” block ~ that is, a block that consists 
of both data and padding bits. There is obviously at most one such block in 
any plaintext padded using this padding method, which is either the last block 
or the one that immediately precedes it. If the padding ends exactly on a block 
boundary, then our attack does not recover any (unpadded) plaintext. 

Obtaining Lu. We want to know Ljy, the data length. For ease of presentation 
we first examine the case r < n — 2, but our algorithm to follow handles all 
values of r. Here, in the same-block padded case, the last plaintext block Pg cor- 
responding to the last ciphertext block Cg in the ciphertext IHUCi 110211 . . .\\Cg 
has a format as follows: 



[DATA]\Q_^{Ld)2 

t P r 

where t + p + r = n and p > 1. In the cross-block padded case, the above format 
spans the last two blocks Pg-i and Pg and we put t + p + r = 2n. We note that 
the attacker does not at first know which of the cases he is faced with. 
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Given a g-block ciphertext, we want to flip the plaintext bit Pq^n-r- 2 , the 
rightmost position at which a data bit could ever reside, given q and our as- 
sumption on r. We submit to the padding oracle the ciphertext 

7G| |Ci I IC 2 I I . . . I |C,-i © 0_^ 10 0_^ I |C,. 

n—r—2 r 

(Recall that Cq = IV, so the case where <; = 1 is included here.) 

Upon submission of the above ciphertext, the oracle will return: 

— VALID meaning the padding has not been disturbed so the bit flipped is a 
data bit. Since this bit is at the rightmost possible data bit position, we can 
deduce the data length Ljj = {q— l)n + n — r — 1. Or else, 

— INVALID meaning a padding bit has been flipped so the padding is no longer 
valid. Therefore the padding boundary is somewhere to the left of this bit, so 
we continue by resetting this bit and flipping the bit immediately to the left, 
and test the resulting ciphertext for padding correctness. We repeat this, 
flipping bits further and further to the left (and into the previous block if 
necesssary) until the first time the oracle returns VALID. This indicates that 
the tested bit is the last data bit, and is determined accordingly. 

One might worry about instances when cross-block padding arises, where 
flipping bits in the last plaintext block (by flipping bits in the last-but-one ci- 
phertext block) would turn the last-but-one plaintext block into “garbage” and 
along with it, potentially, any padding bits within it, so the oracle might report 
INVALID for the wrong bits. On closer inspection, however, this turns out not to 
be an issue because all we want to know is whether the padding boundary is to 
the left or right of the bit in question. Even if the oracle does report INVALID for 
the wrong bits, it does still imply the boundary is to the left, and VALID would 
just mean that unpadded data bits have been corrupted so the boundary is still 
to the right. 

A binary search can also be applied here: for any single flipped bit, a VALID 
response means the start of the padding is to the right of this bit, whereas 
INVALID means it is to the left. This speed-up is made in Algorithml0118-l-m3- 
a2-get-Li) below. 

We are now ready for the decryption stage. Same-block and cross-block 
padded messages are treated differently; recall that knowledge of Ld indicates 
which case the attacker is faced with. 

Decrypting: Same-block. Recall the structure of the last plaintext block: t 
data bits, followed by p padding bits in the form 10 ... 0 and Anally r bits of an 
encoding of data length Lo ■ We can recover the remaining t bits of the plaintext 
in the last block, again using a similar method to decryption phase of the attack 
on ISO/IEC 9797-1 method 3. We submit to the oracle IV'\\C[ where C[ = Cq 
and 



IV' = Cq-i © 0 ... 0 {Ld)2 © 0 ... 0 10 ... 0 (t - 1)2 . 



n—r 



t 



P 
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Algorithm 10118- l-m3-a2-get-LD 

Input: 7P||Ci||C2||...||C„n,r 

Output: Ld 

C':=7P||Ci||C2||...||C, 

I := {q — 2)n + n — r 
u ~ (q — l)n + n — r — 1 

repeat 

h := [{l + u)/2\ 

^[h/n],hinodn • — ,/imodn © 1 

if oracle((7) = VALID then 
1 := h+1 

else if oracle(C) = INVALID then 
u := h 
end if 

C[h/nj,hmodn • ^lh/n\,hmodn © 1 

until I = u 
return Ld ~ I 



After decryption the length block in the plaintext block P{ should have the 
value 7—1 which points to the last-but-one bit of the original data sub-block, 
with the middle padding sub-block being all ‘O’s. A VALID response means the 
last (7*^) data bit in P[ is a ‘1,’ and ‘0’ otherwise. 

By decrementing the length field sub-block in P[ one by one whilst keeping 
all recovered bit positions ‘0,’ a single bit is revealed at each iteration until the 
whole block is recovered. We can now compute the intermediate block I[ by 
XORing the final IV with D[, and then by XORing I[ with Cq-i we get the 
original last plaintext block. 

This decryption procedure is presented in Algorithm 10118- l-m3-decrypt- 
same-block below. 

Decrypting; Cross-block. For cross-block padded plaintexts, Pg is deter- 
mined completely hy Ljj and the padding. However, the padding extends into 
the penultimate plaintext block Pg-i. Suppose u bits of padding are present in 
Pq-i- Then we show how to decrypt Cq-i using Attack 1 above, but with a 
speed-up factor of 2"“^. 

Let V = Ld mod n, then the number of known plaintext bits u is equal to 
n — v and those bits are of the form 10 .^. . 0. If we submit the ciphertext 7R'| |C( 

U 

to the oracle where 



IV' = Cg -2 -r- 1)2 



n—u 



u 



n—r 



n—r 
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Algorithm 10118-l-m3-decrypt-same-block 

Input: Ld, IV,Cq-i,Cq,r,n 

Output: Pq ~ Pq,oPq,i ■ ■ ■ Pk,t-i 10 ■ 0 {Ld)2 

V r 

Ensure: Ld indicates that the plaintext is same-block padded 
C[=Cq 
t Ld mod n 
for j := t — 1 to 0 do 

IV' := Cq-i ©0^(Ld)2©0^10^0')2 

n—r ^ t p ^ 

C := IV'WC'i 
b ■- I2(C') © 1 
PqJ := & © IV/ © Cg-lJ 
IV/ ■- IVj © b 

end for 

return Pq := Pq,iPq ,2 ■ ■ ■ Pk,tW ■ Q(Ag )2 

P r 



and C[ = Cq-i, then we only need to go through all 2’’““+^ settings of the 
r—u+l bits to the left of the u known bits (by changing IV') to guarantee a valid 
plaintext. This strategy takes on average 2’’““ oracle calls which is a fraction 
2 -(“-i) of the original 2’’“^ oracle calls for Attack 1 without the knowledge of 
the u padding bits. 

Complexity. It takes log 2 n oracle calls to find L^,. For same-block padded 
plaintexts, it takes one call per bit for decrypting. So to recover the t data bits 
of the last block, t + log 2 n oracle calls are required. 

For cross-block padded plaintexts, on average 2'’““ oracle calls are needed 
to recover the whole of the penultimate plaintext block Pq-i, where u is the 
number of known bits from finding L £> ■ 

Impact. The attack is highly efficient in terms of oracle queries at extracting 
plaintext bits from the last plaintext block Pq. A maximum of n — r — 1 bits 
of data can be recovered in this way and the attack is therefore significant for 
short messages, especially in combination with a small r. One might argue that 
r = n is a natural choice for the implementor. In this case, the padding is always 
cross-block and the attacker must resort to the speeded-up version of Attack 1. 

5 Conclusions 

We argue that, at least for the CBC-mode of operation for a block cipher stan- 
dard, it is not good enough just to standardise the mode; an entire specification 
handling bit-level computations is needed, which necessarily includes padding 
issues. Padding methods devised for hashing or MAGs, as we have shown, may 
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not be suited to encryption operations where a different adversarial model may 
be applicable. 

We also make the point that there is a need for careful consideration of the 
potential for side-channel cryptanalysis for cryptographic primitives and security 
protocols in their design phase. Designs should be fully specified so as to allow as 
little room as possible for the implementor to take potentially weak approaches 
during implementation. 

We agree with the argument in Section 7 of [1] for the practice of the encryp- 
tion being accompanied by strong integrity checks when possible and appropri- 
ate. Such “authenticated encryption” would, within the context of this paper, 
prevent any practical attempts at constructing a valid ciphertext which in turn 
precludes the existence of a padding oracle, and hence all the associated attacks 
that we have discovered. 
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Abstract. Hash functions are among the most widespread cryptographic primi- 
tives, and are currently used in multiple cryptographic schemes and security 
protocols, such as IPSec and SSL. In this paper, we investigate a new hardware 
architecture for a family of dedicated hash functions, including American stan- 
dards SHA-1 and SHA-512. Our architecture is based on unrolling several mes- 
sage digest steps and executing them in one clock cycle. This modification 
permits implementing majority of dedicated hash functions with the throughput 
exceeding 1 Ghit/s using medium-size Xilinx Virtex EPGAs. In particular, our 
new architecture has enabled us to speed up the implementation of SHA-1 
compared to the basic iterative architecture from 544 Mbit/s to 1 Gbit/s using 
Xilinx XCVIOOO. The implementation of SHA-512 has been sped up from 717 
to 929 Mbit/s for Virtex EPGAs, and exceeded 1 Gbit/s for Virtex-E Xilinx 
EPGAs. 



1 Introduction 

Hash functions are very common and important cryptographic primitives. Their pri- 
mary application is their use for message authentication, integrity, and non- 
repudiation as a part of the Message Authentication Codes (MACs) and digital signa- 
tures [1]. 

The current American federal standard, FIPS 180-2, recommends the use of one of 
the four hash functions developed by National Security Agency (NSA) and approved 
by NIST. By far the most widely used of these four functions is SHA-I (Secure Hash 
Algorithm-I), a revised version of the standard algorithm introduced in 1993. The 
best attack against this algorithm is in the range of 2*° operations, which makes its se- 
curity equivalent to the security of Skipjack and the Digital Signature Standard (DSS). 
After introducing a new secret-key encryption standard, AES (Advanced Encryption 
Standard), with three key sizes, 128, 192, and 256 bits, the security of SHA-1 did not 
any longer match the security guaranteed by the encryption standard. Therefore, an 
effort was initiated by NSA to develop three new hash functions, with the security 
equivalent to the security of AES with 128, 192, and 256 bit key respectively. This ef- 
fort resulted in the development and standardization of three new hash functions re- 
ferred to as SHA-256, SHA-384, and SHA-512 [1]. 
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All four standardized algorithms have a similar internal structure and operation. All 
of them are based on sequential processing of consecutive blocks of data, and there- 
fore cannot be easily sped up by using pipelining or parallel processing (at least when 
only one stream of data is being processed). 

The majority of reported implementations of SHA-1 based on the current genera- 
tion of FPGA devices, such as Virtex [2], can only reach the throughputs up to 500 
Mbit/s [3-9]. The higher speeds can only be accomplished by using more expensive 
FPGA devices, such as Virtex-E or Virtex II (see Table I). Similarly, the FPGA im- 
plementations of SHA-512 based on the medium cost Virtex devices reach the speeds 
in the range of 700 Mbit/s [3,4]. 

Significantly higher speeds might be required for applications such as High Defini- 
tion Television (HDTV), videoconferencing, Virtual Private Networks, etc. [10]. Our 
goal was to propose, implement, and verify a new architecture of standard hash func- 
tions that would allow them to be executed with the throughputs in the range of 1 
Gbit/s using medium cost FPGA devices, such as Xilinx Virtex 1000. 



2 Hardware Architectures of Hash Functions 

A general block diagram common for all four SHA standards and many other dedi- 
cated hash functions is shown in Fig. 1. An input message passes first through the 
preprocessing unit which performs padding and forms message blocks of the fixed 
length, 512 or 1024 bits, depending on the hash function. The preprocessing unit 
passes message blocks to the message scheduler unit. Message scheduler unit gener- 
ates message dependent words, W^, for each step of the message digest. The message 
digest unit performs actual hashing. In each step, it processes a new word generated 
by the message scheduler unit. The message digest is the most critical part of the im- 
plementation, as it determines both the speed and area of the circuit. 

The most straightforward implementation of the message digest, most often used in 
practice is shown in Fig. 2a. It is called the basic iterative architecture (or just basic 
architecture). In this architecture, registers R and H are first both initialized with a 
value of the constant initialization vector, IV. Subsequently, the architecture executes 
one step of the message digest per one clock period. In each step t, the message digest 
accepts a different message dependent word, W^, and a different step dependent con- 
stant, K|. After executing all steps, the result of the last step, stored in the register R, is 
added to the previous value of the register H. Then, the processing of the message di- 
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Fig. 1. General block diagram of the hardware implementation of a dedicated hash function, 
such as SHA-1 and SHA-512 
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Fig. 2. General diagrams of the message digest units for a) basic architecture, b) partially un- 
rolled architecture with k steps unrolled 

gest resumes for a new set of the message dependent words, W^, corresponding to the 
new block of the message. 

Two straightforward ways of speeding up hardware implementations of hash func- 
tions (and any other logic functions) are parallel processing using multiple instantia- 
tions of the basic architecture, and pipelining. Out of these two methods, pipelining is 
more attractive because of the smaller area penalty. Nevertheless, both of these archi- 
tectures are able to improve an average circuit throughput only under the assumption 
that multiple independent streams of data are processed simultaneously. If a single 
long message needs to be hashed, none of these architectures offers any improvement 
in terms of the execution time. 

A new architecture of the dedicated hash functions investigated in this paper is 
shown in Fig. 2b. It is called partially unrolled architecture. In this architecture, k 
steps have been “unrolled” and are executed in the same clock cycle. As a result, the 
total number of clock cycles necessary to compute one iteration of the message digest 
has been reduced by a factor of k. At the same time, the critical path through k steps is 
likely to be significantly shorter than k times the path through a single step. This is 
because in hash functions, the critical path through a step of the message digest is dif- 
ferent for each word of the step input (see Fig. 3). 



3 Previous Work 

Fully and partially unrolled architectures of dedicated hash functions have been inves- 
tigated by several authors in the past, but no definite conclusions have been made. In 
[11] a fully unrolled architecture of MD5 has been compared with a basic iterative ar- 
chitecture. Unrolling of all 64 rounds resulted in a throughput increase by a factor of 
2.1, while at the same time the circuit area increased by a factor of 5.4. In [12] a par- 
tially unrolled architecture of SHA-1, with the number of rounds unrolled k=5, has 
been investigated. A high level architecture presented in this paper was very similar to 
the one proposed in this paper. Nevertheless, the reported results were rather 
discouraging, with only 11% gain in the circuit throughput and a 43% penalty in the 
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aging, with only 11% gain in the circuit throughput and a 43% penalty in the circuit 
area for the partially unrolled architecture over the basic iterative architecture. 

All other hardware implementations of dedicated hash functions reported in the lit- 
erature [9, 13, 14] or available as commercial IP cores [3-8] have followed the basic 
iterative architecture with only one step of hash function executed in each clock cycle. 



4 Details of the Hardware Architectures 

4.1 Internal Structure of the Message Digests of SHA-1 and SHA-512 

Internal structures of the message digests for SHA-1 and SHA-512 are shown in Fig. 
3. In both functions, input registers are initialized with the constant initialization vec- 
tor, and are updated with the new value in each round. In SHA-1, four out of five 
words (A, B, C, and D) remain almost unchanged by a single round. These words are 
only shifted by one position down. The last word, E, undergoes a complicated trans- 
formation equivalent to multioperand addition modulo 2^^, with five 32-bit operands 
dependent on all input words, the round-dependent constant K,, and the message de- 
pendent word Wj. The internal structure of the message digest of SHA-512 is similar. 
The primary differences are as follows: The number of words processed by each 
round is 8, each word is 64 bits long, and the longest path is equivalent to addition of 
seven 64-bit operands modulo 2“. These operands depend on seven out of eight input 
words (all except D), the round-dependent constant K,, and a message dependent word 
W,. Six out of eight input words remain unchanged by a single round. 



4.2 Basic Architecture of SHA-1 

From Fig. 3a, the critical path of a single SHA-1 round involves the calculation of the 
chaining variable A at the moment t-tl, given by the following formula: 

A„, = A,«<5 + f,(B„ C„ D,) + E, + K, + W, + HA’, 
a) b) 





Fig. 3. Internal structure of a single message digest round of a) SHA-1, b) SHA-512 
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Fig. 4. Our implementation of the message digest unit of SHA-1 in the basic iterative architec- 
ture 

where X, is a value of the variable X in the step t, and HA’, = HA when t=79, other- 
wise 0. HA is a word A of the register H in Fig. 2a. 

Additionally, we know that 

B„ = A,.„ C,= B,.,«<30, D,= C,.,. 

None of these operations involve any logic, consequently, the expression 
f,(B„ C„ D,) = f,(A,,, B,,«<30,C,,) 

can be precomputed in the previous clock cycle, t-1, and will not contribute to the 
critical path. Similarly, the sum 

Xha.k.w.= K, + W, + HA’, 

can be precomputed by the message scheduler unit, because all values are known al- 
ready in the previous clock cycle. 

As a result, the critical path reduces to the addition of four operands 

~ A,«<5 H- E, H- Xha kiwi"*" ^^t-i'^^'^30, C,i). 

All aforementioned optimizations lead to the schematic of the basic architecture of 
SHA-1 shown in Fig. 4. The lowest level multiplexers choose initialization vectors 
IV„ to IV,, only in the first clock cycle of computations for any new message. The 
variables HB’.. HE’ are equal to HB..HE only in the last step of the message digest 
computations for a given message block, i.e., only when t=79; otherwise, they are 
equal to zero. 
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4.3 Partially Unrolled Architecture of SHA-1 

The optimization of the unrolled message digest is relatively straightforward. The 
general technique employed is to precalculate sums at the earliest possible stage using 
either regular carry propagate adders (CPAs) or carry save adders (CSAs) (see Fig. 5). 
The calculations in the critical path follow a sequence of computations described by 
the equations below: 




Fig. 5. Our implementation of the message digest unit of SHA-1 in the partially unrolled archi- 
tecture with 5 steps unrolled 
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= A,«<5 + f,(B„ C„ D,) + E, + K, + W,= A,«<5 + f,(B„ C„ D,) + E. + Z K, W. 

A,+2 = A,+i«<5 + ft+i(B,+i> C,+i> D,^j) + Ej^, + K,^j + W,^j = 

= A,,,«<5 + [C.(A„ B,«<30, Q + D, + Z K,,, W,,J 
A, ,3 = A„3«<5 + [f„3(A„, A.«<30, B,«<30) + [C, + Z K,,, W,,J] 

A,^ = A,,3«<5 + [f„3(A,,3, A„,«<30, A.«<30) + [B,«<30 + Z K.,3 W„3]] 

A, 35 = A, ^<<<5 + [f,34(A,^3, A,^3«<30, A,^,«<30) + [A,«<30 + Z K,^ W,^ + HA’,^]]. 

At each stage two paths are critical. One is a calculation of the new value of A^^j 
(i=1..5), which involves rotation by five positions and a single addition. The second is 
the precalculation of the value of [f^^^ + [E^^j + Z to be used in the next stage. 

This precalculation involves the calculation of f^^^ and a single addition of a precalcu- 
lated value [E,,, + Z K.,,WJ . 

In the first stage of computations (computing A,^j), precalculated values do not ex- 
ist, so the computations must be performed from scratch. In every second stage start- 
ing from stage two, the precomputation of the sum [f,^| + [E,^| + Z Is+iW,^,]] is the most 
time consuming operation. Einally, in every second stage starting from stage three, the 
only contribution to the critical path is a single addition. 



4.4 Basic Architecture of SHA-512 

From Fig. 3b, the critical path of a single SHA-512 round involves the calculation of 
the chaining variable A at the moment t-tl, given by the following formula: 

A.^, = S0(A,) + Maj(A„ B„ C,) + S1(E,) + Ch(E„ F„ G,) + K, + W. + H, + HAJ 

where X, is a value of the variable X in the step t; SO, Maj, SI, Ch are the logic func- 
tions defined in the SHA-512 standard, and HA’^ = HA when t=79, otherwise 0. 
Additionally, we know that 

H, = G„, 

The functions SO and Maj execute in parallel in approximately the same amount of 
time. The same holds true for functions SI and Ch. 

The sum 

KWHA, = K, H- W, H- G,.j H- HA’, 

can be precomputed in the previous clock cycle, t-1. 

As a result, the critical path reduces to the addition of five operands 

A,^, = S0(A,) + Maj(A„ B„ C,) + S1(E,) + Ch(E„ F„ G,) + KWHA,. 

All aforementioned optimizations lead to the schematic of the basic architecture of 
SHA-512 shown in Fig. 6. The registers HA-HH are set to the initialization vectors 
IV„ to IV, only in the first clock cycle of computations for any new message. The 
multiplexers selecting between HB and ‘O’, HC and ‘O’, etc. choose non- zero values 
only in the last step of the message digest computations for a given message block, 
i.e., only when t=79. 
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Fig. 6. Our implementation of the message digest unit of SHA-512 in the basic iterative 
architecture (PC - a 5-to-3 parallel counter, see [9]) 



4.5 Unrolled Architecture of SHA-512 

The unrolled architecture of SHA-512 is shown in Fig. 7. Because of the dependence 
of on E,, and A,^j on and E^ (see Eig. 3b), three major critical paths (AO to AO, 
EO to AO and EO to EO) exist in the circuit. These paths are marked in Eig. 7 with 
thicker lines. Values of variables A,^;, and E,^j are denoted as “Ai” and “Ei” respec- 
tively, e.g., “E2” denotes E^^^. Precomputations in the previous clock cycle are used to 
reduce the number of operands in the first four stages of the unrolled architecture. 
Recall that in the basic architecture, the KWHA, sum is computed based on the equa- 
tion H, = G,.j. In the unrolled architecture with k=5, t changes by 5 every clock cycle 
As a result, H, = G, j = E^ j = E^ 3 = = “E2” in the previous clock cycle. 
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On the far left side of Fig. 7, “E2” is used to precompute KWFIO (notation for 
KWHAj^u) for the next clock cycle. 

KWHO = KWHA, = K, + W, + H, + HA\ 

This method is repeated in stages two to four in order to compute KWHA^^, (denoted 
in Fig. 7 as KWFIi, i=1..3). In stage 5, = E,^j = “El”, so this value is computed in 

the same clock cycle, and as a result is not included in the earlier precomputed KWH4 
= KWHA,^^, which reduces to KWHA,^^ = + W,^. Please, note that in Fig. 7, the 

sum is denoted as KWi. 




Fig. 7. Our implementation of the message digest unit of SHA-512 in the partially unrolled ar- 
chitecture with 5 steps unrolled 
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Further reductions in critical paths were accomplished in each stage by adding val- 
ues of logic functions S 1 and Ch as early as possible, reusing values of S 1 -i- Ch, and 
by selective routing to balance the number of slices in various critical paths. 



5 Design Methodology and Results 

Our target FPGA device was the Xilinx Virtex XCVlOOO-6. This device is composed 
of 12,288 basic logic cells referred to as CLB (Configurable Logic Block) slices, in- 
cludes 32 4-kbit blocks of synchronous dual-ported RAM, and can achieve synchro- 
nous system clock rates up to 200 MHz [2], XCVIOOO was chosen because of the 
availability of a general purpose PCI board, SLAAC-IV, based on three FPCA de- 
vices of this type [10]. Additionally, a new family of Virtex-E Xilinx devices was tar- 
geted as well. 

All hardware architectures were first described in VHDL, and their operation veri- 
fied through functional simulation using Active HDL, from Aldec, Inc. Test vectors 
and intermediate results from the reference software implementations based on the 
Crypto-H- library [15] were used for debugging and verification of VHDL codes. The 
revised VHDL code became an input to logic synthesis performed using FPCA Com- 
piler II from Synopsys. Tools from Xilinx ISE 4.2 were used for mapping, placing, 
and routing. These tools generated reports describing area and speed of implementa- 
tion, a netlist used for timing simulation, and a bitstream used to configure an actual 
EPCA device. All designs were fully verified through behavioral, post-synthesis, and 
timing simulations. 

The experimental testing of our cryptographic modules was performed using the 
SLA AC- IV hardware accelerator board, including three Virtex 1000 EPCAs as the 
primary processing elements. Only one of the three EPCA devices was used to im- 
plement hash core. 

Test program written in C used the SLAAC-IV APIs and the SLAAC-IV driver to 
communicate with the board. Our testing procedure is composed of three groups of 
tests. The first group verifies the circuit functionality at a single clock frequency. The 
goal of the second group is to determine the maximum clock frequency at which the 
circuit operates correctly. Einally, the purpose of the third group is to determine the 
limit on the maximum encryption and decryption throughput, taking into account the 
limitations of the PCI interface. 

In Pig. 8, the minimum clock periods of SHA-1 and SHA-512 obtained using static 
timing analysis and the experiment are given. Por the unrolled architecture, the effec- 
tive clock period is the minimum time necessary for the data signals to pass the criti- 
cal path. Since in both our unrolled designs, the data signal is traveling through the 
critical path over multiple clock periods, the effective clock period is a multiple of the 
actual clock period. In case of the unrolled architecture for SHA-1 the multiplication 
factor is 2, in case of the SHA-5 12 architecture, the multiplication factor is 5. 

Based on the knowledge of the minimum clock period, the maximum data through- 
put has been computed according to the equation: 

Throughput=Message_block_size / (Effective_clock_period * Number_of_rounds/k) 
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The maximum throughput values calculated based on the minimum clock periods 
obtained using static timing analysis and experiment are shown in Fig. 9. In the same 
figure, these results are compared with the experimentally measured data throughputs 
that take into account the delay contributions and the bandwidth limit of the PCI inter- 
face. This comparison demonstrates that the PCI interface is capable of operating with 
a constant uninterrupted data flow up to about 960-990 Mbit/s, and has a negligible 
influence on the data throughput below this communication rate. 

The number of CLB Slices used by our implementations of SHA-1 and SHA-512 
are shown in Tables 1 and 2. In SHA-512, four 4 kbit block RAMs are used to store 
80 64-bit constants K,. 

Out of the two analyzed hash standards, SHA-1 offers much better potential for 
loop unrolling. As a result of loop unrolling, the throughput of SHA-1 increased by a 
factor of almost two (1.9 times), while at the same time its area grew only by a factor 
of three. SHA-512 is much less suitable for loop unrolling, as its observed speed-up 
was only 30%, and the area increase 48%. 
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Fig. 8. Minimum clock periods of SHA-1 
and SHA-512 in the basic iterative archi- 
tecture and partially unrolled architecture 



Fig. 9. Maximum throughputs of SHA-1 
and SHA-512 in the basic iterative archi- 
tecture and partially unrolled architecture 



6 Comparison with Other Hash Cores 

There exist multiple commercial IP cores implementing SHA-1 [3-8]. In Table 2, we 
present the comparison of our designs for SHA-1 with the most representative IP 
cores with equivalent functionality. For the Xilinx Virtex family of FPGA devices, 
our core for SHA- 1 in the basic iterative architecture outperforms the second best core 
(from Helion Technology Ltd) by 13%, using 30% less CLB slices. Our core for the 
partially unrolled architecture of SHA-1 with 5 rounds unrolled, outperforms all re- 
ported Virtex cores by a factor of at least two in terms of throughput, and uses about 
two times more area. The similar advantages exist for the implementations using 
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Virtex-E devices, where our core for the unrolled architecture approaches the 
throughput of 1.2 Gbit/s. 



Table 1. Comparison of our designs for SHA-1 with the representative commercial IP cores 
with equivalent functionality 



Source 


Clock 

frequency 

[MHz] 


Throughput 

[Mblt/s] 


Area 

[CLB Slices] 


1 Xilinx Virtex | 


Our, basic 


85 


544 


480 


Our, unrolled (k=5) 


64‘ 


1024 


1480 


ALMA Technologies 


70 


442 


686 


Helion Technology Ltd. 


76 


480 


689 


Ocean Logic Pty Ltd 


56 


352 


612 


1 Xilinx Virtex-E | 


Our, basic 


103 


659 


484 


Our, unrolled (k=5) 


72.5 


1160 


1484 


ALMA Technologies 


87 


549 


686 


Bisquare Systems Private 
Limited 


66 


422 


579 


Helion Technology Ltd. 


95 


600 


689 


Intron, Ltd. 


71 


449 


716 


Ocean Logic Pty Ltd 


71.5 


452 


612 


1 Xilinx Virtex-II | 


ALMA Technologies 


102 


644 


686 


Amphion Semiconductor 


99 


626 


854 


Helion Technology Ltd. 


103.5 


654 


569 


Ocean Logic Pty Ltd 


79 


498 


612 



Table 2. Comparison of our designs for SHA-512 with the representative commercial IP cores 
with equivalent functionality 



Source 


Clock 

frequency 

[MHz] 


Throughput 

[Mbit/s] 


Area ’ 

[CLB Slices] 


1 Xilinx Virtex | 


Our, basic 


56 


111 


2384 Slices 


Our, unrolled (k=5) 


6f 


929 


3521 Slices 


ALMA Technologies 


56 


707 


2690 Slices 


1 Xilinx Virtex-E | 


Our, unrolled (k=5) 


If 


1034 


3517 Slices 


ALMA Technologies 


68 


859 


2690 Slices 


1 Xilinx Virtex-ll | 


ALMA Technologies 


72 


910 


2507 Slices 


Amphion Semiconductor 


50 


626 


2403 Slices 



‘ multi-cycle clock used in the critical path, critical path < 2 5 steps executed in 2 

clock cycles 

^ multi-cycle clock used in the critical path, critical path < 5 = 5/f(,|^^, 5 steps executed in 5 

clock cycles; 

^ each circuit contains additionally 4 Block RAMs 
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At this point, there are relatively few cores available for the new standard, SHA- 
512 (see Table 2) [3, 4]. Our implementation of the basic iterative architecture slightly 
outperforms the equivalent core from ALMA Technologies in terms of throughput, 
using a smaller amount of FPGA resources. Our partially unrolled architecture is the 
fastest core for the Virtex family of FPGA devices outperforming the second best core 
by 30% at the cost of only 3 1 % increase in the circuit area. For the Virtex-E family of 
FPGA devices our core is the only currently available SHA-512 core that exceeds the 
throughput of 1 Gbit/s. 



7 Comparison with Software Implementations 

Efficient software implementations of hash functions have been extensively studied in 
the literature [17-20]. In [17], basic recommendations on developing an efficient and 
portable implementation of SHA-1 in C have been formulated. In [18], a close to op- 
timum implementations of dedicated hash functions using Pentium’s superscalar ar- 
chitecture have been presented. In [19], software parallelism of all major dedicated 
hash functions have been studied. Finally, in [20], optimizations targeting Pentium III 
have been investigated. These optimizations made use of MMX registers and instruc- 
tions available in Pentium III. 

In this paper, we used for comparison, software implementations of SHA-1 and 
SHA-512, available as a part of the Crypto-t-H- library [15]. Although Crypto-H- is not 
the fastest of the reported software implementations, the reason for using this library 
was its portability, availability in public domain, and wide practical deployment. 

A PC with 2.2 GHz clock, 1 GByte RAM, and cache size 512KB, running Win- 
dows XP was used in our measurements. The Crypto-H- implementation of hash func- 
tions written in C-H- was compiled using MS Visual Studio with Service Pack 5. The 
obtained throughput was 40.5 Mbit/s for SHA-1 and 30.4 Mbit/s for SHA-512. These 
throughputs were respectively 25 times and 3 1 times smaller than the throughputs of 
our partially unrolled hardware implementations of SHA-1 and SHA-512 for Xilinx 
Virtex 1000-6 FPGAs. 



8 Summary 

A new partially unrolled architecture has been proposed for a family of dedicated 
hash functions, including four American standard algorithms SHA-1, SHA-256, 
SHA-384, and SHA-512. The unrolled architecture has been designed, optimized, and 
experimentally verified for the most widely used hash algorithm, SHA-1, and one of 
the new hash standard algorithms SHA-512. For the purpose of comparison, the basic 
iterative architecture has been implemented for both functions as well. 

The new architecture appeared to be particularly suitable for the implementation of 
SHA-1. For the number of rounds unrolled equal to k=5, it allowed to almost double 
the throughput of SHA-1 compared to the basic iterative architecture, at the cost of 
increasing circuit area by a factor of three. The similar design for SHA-512 appeared 
to have much less benefit; the increase in the circuit throughput was only 30%, and 
the area of the circuit increased by 48%. 
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This different behavior of two hash algorithms could be easily explained by ana- 
lyzing the structure of both algorithms. In the unrolled architecture of SHA-1, many 
message digest steps could be substantially sped up by preprocessing partial results of 
a given step in the previous steps. The same optimization was not possible in SHA- 
512 due to sequential dependencies present in the algorithm. 

Our partially unrolled implementation of SHA-1 reached the target throughput of 
1 Gbit/s in Virtex XCVIOOO, and outperformed all known to the authors commercial 
IP cores with equivalent functionality by at least a factor of two. Our implementation 
of SHA-512 also compared favorably with commercial IP cores, and reached a target 
throughput of 1 Gbit/s using Virtex-E family of Xilinx FPGAs. To our best knowl- 
edge, our implementations of SHA-1 and SHA-512 are the only FPGA implementa- 
tions of these hash functions available to date that can sustain a throughput over 
1 Gbit/s for a single stream of data. 
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Abstract. A hash chain is a sequence of hash values Xi = hash(a;i_i) 
for some initial secret value xo- It allows to reveal the final value x„ and 
to gradually disclose the pre-images x„-i, Xn-2, ■ ■ ■ whenever necessary. 
The correctness of a given value Xi can then be verified by re-computing 
the chain and comparing the result to x„. Here we present a method 
to speed up the verification by outputting some extra information in 
addition to the chain’s end value Xn- This information allows to relate 
the verifier’s workload to a variably chosen security bound. That is, 
on input a putative chain value the verifier determines a security level 
(i.e., security against adversaries with at most T steps and success 
probability e) and performs only a fraction p = p{T, e) of the original 
work by using the additional information. We also show lower bounds 
for the length of this extra information. 

Keywords. Certificate, hash chain. Hash function. Hash tree. 



1 Introduction 

A hash chain, introduced by Lamport [12], is a sequence of hash values Xi = 
hash(xi_i) for a seed xg where hash is some collision-intractable hash function 
(or some other publicly computable one-way function). Such a chain allows the 
owner of the seed to publish the chain’s end value x„ and to stepwise release 
the pre-images x„-i, x„- 2 , ■ ■ ■ such that revealing at step i does not help 
to find some of the values x„-i-i, . . . , xg. The receiver can check the validity of 
some received Xn-i by re-calculating the chain starting with Xn-i up to Xn- 
Hash chains have numerous applications. One of the best known is Micali’s 
suggestion to use them as certificate chains [14]. Roughly, for the user’s public 
key pk of a signature or encryption scheme the certification authority (CA) 
publishes a certificate of and pk (and possibly further information). For the 
f-th time period of some pre-determined length the CA hands the pre-image 
Xn-i to the user who can then provide this value as a certificate for his public 
key during this time period. To revoke the certificate the CA stops delivering 

* This work was supported by the Emmy Noether Programme Fi 940/1-1 of the Ger- 
man Research Foundation (DEG). 
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the pre-images. Since it is infeasible to find the pre-image of the previously given 
value, forgery of certificates for future time periods is unlikely. 

Other applications areas of hash chains include the design of micropayments 
schemes [10,17], the S/KEY one-time authentication (RFC 1760) [5,6], securing 
routing information (e.g., [9,7,8]) and spam-fighting protocols [4]. Similarly, one- 
way chains — where the hash function is replaced by a one-way function — have 
been deployed for the BiBa signature scheme [16] and for multicast authentica- 
tion [15]. 

In some of the aforementioned areas the verification procedure can be short- 
ened significantly. Namely, if the verifier stores a previously verified chain value 
Xm for m < n, then the next time a value is presented, the verifier merely has 
to re-calculate the chain up to the stored value Xm- However, considering certifi- 
cates for example, the owner of the seed may visit some sites only sporadically. 
Similarly, for routing protocols the information may be passed unfrequently. Or, 
the verifier may not be able to store previous chain values due to memory limita- 
tions or other restrictions. Finally, in some solutions, like the anti-spam solution 
of Dwork et al. [4], the values are not released gradually but rather require the 
verifier to re-compute a full hash chain. Hence there are cases where large parts 
of the chain may still have to be verified. 

Related Results. The need for faster verification of hash chains has immediately 
lead to so-called hash trees [13]. Such constructs condense the long chains to tree- 
like structures such that the path from any value to the published root shrinks 
to logarithmic length. Unfortunately, in order to verify a given value the user 
has to supply logarithmically inner nodes of the tree as a proof of correctness. 
Hence, two of the advantages of hash chains, low communication complexity and 
structural simplicity, vanish and are traded for faster verification. 

Interestingly, quite a few efforts have dealt with the problem of fast computa- 
tion of intermediate values Xi, for both chains and trees [3,11,18]. That is, if the 
user only stores the seed xq then, in order to release Xi of a hash chain, he needs 
to re-calculate the chain starting with xg up to Xi. It is preferrable for the seed 
owner, of course, to keep some intermediate values , . . . , confidentially, 
and to recover any Xi from these values much faster. 

The results in [3,11,18] give constructions for storing and recovering inter- 
mediate values of chains and trees. They also give lower bounds showing that 
the constructions are optimal with respect to time/storage trade-offs. However, 
none of these solutions improves the verification time. This is especially true for 
the hash chains, and which would lessen the disadvantage of chains versus trees. 

Our Results. Our solution is to let the owner of the seed generate some sup- 
plementary information which is published together with the chain’s end value 
Xn. This extra information then allows to improve the verification time when 
the verifier is presented an allegedly correct chain value. Specifically, for secu- 
rity bound T and e on the adversarial running time and success probability, 
respectively, our construction allows to decide correctness after roughly a frac- 
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tion p = (logT + log i)/100 of the original workload. Here, the workload is the 
number of hash function evaluations (i.e., it is equal to i if Xn-i is given). 

The interesting property of our construction is that the two security param- 
eters T and e can be chosen individually by any verifier, even differently for each 
verification run. Once the verifier has selected “his” security level this determines 
the fraction p = p{T, e) of hash chain computations. In other words, the more 
liberal the verifier chooses the security level the less work he has to carry out. 

We emphasize that the security level (T, e) for the fast verification should not 
be confused with security of the hash functions against collision-finders. As for 
collisions we know that, by the birthday paradox, collisions for hash functions 
with n-bits output can be generated with probability more than 1/2 within 
2”/^ steps. Once such a collision is found the complete hash chain becomes 
disaffected. In our model, we simply assume that finding such collision is beyond 
feasible attacks. Security here refers to attacks in which the verifier should be 
forced to perform more than a fraction p{T, e) of the work; even if the adversary 
overcomes this security bound the verifier can may still raise the level for the 
next verification. 

In our solution the extra information the seed owner attaches to Xn is called 
a check-bit vector. As explained, this check-bit vector is a universal parameter 
enabling different security/ workload levels for the verifiers. Another interesting 
characteristic, in addition to the time improvement, is the length of such check- 
bit vectors: very long vectors may outweigh the gain in verification time. We 
therefore investigate lower bounds for this length. 

The bounds on the length of check-bit vectors vary with the way the vectors 
are created. In the most simple case the extra information is chosen according 
to the time period i and merely consist of some fixed number of the bits of the 
intermediate value Xn-i- For this type of schemes, under which our construction 
falls, we show that approximately (logT — logn -I- log logn bits are required. 
In comparison, our solution produces check bit vectors of about 1001og2n bits, 
which for n = 1,024, T = 2"*° and e = 2“^° and p = 60% for example, yields 
respectable 1,000 bits. Still, this is slightly better than the usual 1601og2n = 
1, 600 bits to communicate the inner nodes of a tree, and the communication of 
the public check-bit vector amortizes over the time periods. Yet, hash trees are 
usually much faster verifiable, in particular with respect to “standard” security 
levels. 

Organization. In Section 2 we define check-bit schemes and their security for- 
mally. We present our lower bounds in Section 3 and our construction appears 
in Section 4. We conclude with a brief discussion in Section 5. 

2 Definition 

In the most simple form, a hash chain (for a given length parameter n) can be 
described by two algorithms Q and V, the generator and verification algorithm. 
The former algorithm simply chooses a random xg and computes the chain up 
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to Xn, and the verifier on input Xn and some putative chain value x for time 
period i merely checks that hash*(a;) = x„. 

Check-Bit Schemes. Here we augment the basic hash chain generation and ver- 
ification. Algorithm Q, when generating the chain for seed xq, repeatedly runs 
a deterministic selection algorithm S as subroutine for each hash function iter- 
ation. For each such execution, for i = n — 1 down to 0, algorithm S produces a 
string chi (possibly the empty string A), which is determined by the time period 
number i, the intermediate value Xn-i = hash”~*(a:o) and the preceding strings 

cbj^i , . . . , cbj^_ I . 

The so-called check-bit vector cb is the concatenation of all strings cbo,cbi, 
. . . , cb„_i, ordered according to the release time. We assume that the position 
of cbi within cb and its length are recoverable from cb; this clearly inhibits lossy 
encodings and we thus call the constructions allowing to recover cb^ schemes with 
lossless encoding. Nonetheless, since we only deal with such schemes throughout 
the paper we often drop this appendix. Let cb>i be the string cbi|| . . . ||cb„_i 
and set cb>j = cb>j+i for t < n — 1 (where cb>„_i = A). 

As before, the verifier V takes a value x and integer i together with the chain’s 
end value as input, and verifies that x is the correct pre-image for time period 
i. This time, however, the verifier also gets the check bit vector cb as extra input 
and uses this value to shorten the verification: For each hash function iteration in 
time period j the verifier now also calls iS(j, hash'^~*(a;), cb>j) and compares the 
result to the given cbj. If a mismatch occurs then reject, else continue (possibly 
up to the chain’s end). 

Moreover, the verifier gets two parameters T and e representing the bounds 
on the adversarial running time and success probability (both characteristics 
are specified below). Instructively, one may think of these two variable security 
parameters as determined by V before starting the verification, although for ease 
of notation we sometimes set these parameters instead and then provide these 
fixed values to the verifier. 

Definition 1. A check-bit scheme with lossless encoding and for parameter n is 
a triple (G,V,S) of algorithms (of which Q is probabilistic) such that 

Algorithm Q: 

— picks a seed Xg according to some efficiently samplable distribution, 

— computes Xi = hash(a;i_i) for i = 1,2, ... ,n, 

— computes cb^ = S{i, cb>i) for i = n — 1, . . . ,0, 

— outputs {xg,Xn,ch). 

Algorithm V: 

— gets inputs cb and x, an integer i as well as T and e, 

— repeats the following until i = 0 or halt: 

• if chi yf iS(t, a;, cb>j) then reject and stop^ 

^ Here V recovers cbi,cb>i from cb. 
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• else set i i — 1 and x f— hash(a;) 

— if X = Xn then accept, else reject. 

Algorithm S: 

— takes an integer i, a value x and a string cb>i as input, 

— computes and returns chi = S{i,x,cbyi). 

In addition, the scheme is complete, i.e., the verifier never rejects a valid input 
Xn,ch,i and x = Xn-i produced hy Q, independently ofT and e. 

Note that the selection algorithm S is defined to be deterministic. On one 
hand, this simplifies the definition and analysis significantly. On the other hand, 
it does not weaken the model too much. Namely, for a given hash function 
hash define hash^(a;i||r) = hash(a;i)||r such that r remains unchanged during the 
iterations. If a;o||r is chosen at random by Q then S can use r as externally 
provided random coins. This corresponds, of course, to public coins, as the right 
part of the chain’s end value Xn\\r is output, too. However, public randomness 
ensures that any verifier can re-calculate the selection algorithm’s output and 
compare it to the given check-bit vector. 

Attacks. In order to define security we have to specify the attack mode first. 
We measure the running time T of the adversary by counting the hash function 
evaluations only. Formally, we therefore provide the attacker with an oracle 
hash(-) which she can access, but for which “guessing” images, i.e., generating 
images without querying the oracle, is infeasible. The next parameter e G [0, 1) 
basically represents a bound on the adversary’s success probability. We also 
introduce a parameter p G [0, 1) which bounds the fraction of the original work 
the verifier performs. We define the following experiment for a check-bit scheme 
(t/, V, 5) with parameter n: 

Experiment Exp_ 4 (T, e,p): 

— Algorithm Q generates (xo,a;„,cb) 

— The adversary A gets as input (x„,cb). The adversary also gets access to 
an oracle Release(-) which takes integers j as input and returns Xn-j. Let r 
denote the miminum over all queries to Release (where r = n ii A has never 
queried the oracle). 

~ In addition to oracle queries the adversary performs internal computations 
and finally outputs {x, k). 

— The verifier V is invoked on (x^, cb, x, k, T, e) and returns the decision after 
V hash function evaluations. 

We say that adversary A wins experiment Exp_,^(T, e,p), 

— if the adversary makes at most T hash function evaluations, and 

— if the verifier makes V > [pfc] hash function evaluations, and 

— if the adversary has queried the oracle Release only about values larger than 
k, i.e., if fc < r. 




344 



M. Fischlin 



Security. Informally, a check-bit scheme is (T, e,p)-verifiable if no adversary 
running in time T can cause the verifier to perform a fraction p or more of 
the work with probability more than e. Here, the work refers to the number k 
of hash function evaluations required to verify the correct value Xn-k at time 
period k. As explained above, we usually envision the security bound as chosen 
by the verifier, and that this bound then determines the required fraction of the 
work. In this sense, p = p{T, e) is a function of the security level, and we call a 
check-bit scheme p- verifiable if for any (T, e) it is (T, e,p(T, e))-verifiable. More 
formally. 

Definition 2. A check-hit scheme (t/,V,5) with parameter n is called (T, e,p)- 
verifiable if, for any adversary A running in time at most T, the probability 
of A winning experiment F,xp^(T,e,p) is at most e. The scheme is p-verifiahle 
if for any adversary A and any T, e, the probability of A winning experiment 
Exp_ 4 (T, e,p(T, e)) is at most e. 

We have chosen a relative bound to measure the work to be performed, i.e., 
if p = 1/2 then at time period 3n/4 the verifier needs 3n/8 hash evaluations, 
at time period n the verifier has to compute n/2 hash values etc. Alternatively, 
one may define an absolute bound saying that the verifier has to do ru = w{T, e) 
(or less) hash function evaluations, independently of the time period. But first 
note that such an absolute bound easily follows if we set w = pn. Second, some 
applications may bear in mind that verification is faster for the first time periods. 
In this case, it is preferrable to have a relative work reduction saying that you 
save up to 50%, for instance, at any time period. 

3 Lower Bounds 

We first show a lower bound for special check-bit procedures in Section 3.1. 
This bound holds for arbitrary security parameters T, e and thus even yields a 
bound for the more liberal case of (T, e,p)-verifiable schemes. The bound says 
that the selection algorithm S must essentially generate check-bit vectors of 
(logT — log hn log i)logn bits, where h is the maximum number of hash 
function evaluations for each of the n iterations (including the ones for the 
computation of 5). 

The bound above holds for selection algorithms where the length of the out- 
put chi may depend on the position i but not the intermediate value. We call 
such schemes position- driven selection algorithms: 

Definition 3. Let (G,V,S) be a check-bit scheme (for parameter n) . Algorithm 
S is position- driven if for any two seeds xo,yo we have |cbi(xo)| = |cbi(po)|- 

In general, the length of cb^ may depend on the preceding values or check 
bits as well, and thus |cbi(xo)| can be different from |cbi(po)|- In this case, the 
generator G possibly outputs some seeds xq with very short check-bit vectors. 
For such schemes we yet show in Section 3.2 that check-bit vectors with only a 
slightly smaller length than above must still be produced with high probability. 
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For both bounds, i.e., even if |cbi(xo)| ^ |cbi(yo)|, we make the following 
assumption which basically says that the adversary will find matching check 
bits for random samples with at least the guessing probability. 

Assumption 1. Let (t/,V, 5) &e a check-hit scheme with lossless encoding and 
parameter n. Then, for any i, we assume that for random xo,yo the probability 
that cbi(xo) = cbi(yo) is at least 2“ . The probability is over 
the choice of Xg and yg . 

3.1 Position-Driven Selection Algorithms 

Throughout this section we use the following notation (visualized in Figure 1): 
We let [l,n] be the set of integers between 1 and n. Each integer represents the 
number of hash function evaluations that are required to verify a given value x 
at time period i. 

We divide [1, n] into disjoint intervals. For this, let ag, . . . , o/ be a sequence 
of increasing values with oq = 0 and aj = 1 for an appropriate integer I (which 
we will specify later). For £= 1, . . . , / define the Ath interval Xi to be + 

l,a^n], where we assume for simplicity that all a^n’s are integers. 

Let (t/,V,5) be a (T, e,p)-verifiable check-bit scheme with a position-driven 
selection algorithm. For a seed xg chosen by Q let C( be the number of check-bit 
positions in the interval T£. Note that, by assumption, C£ does not depend on 
Xg. The sum over all c^’s is therefore the total number of check bits for which 
we prove our lower bound. 



OCq OCi 



0 1 



Otl-2 Oti_i 



interval no. I 



O 



n-1 n 



time period i 
(releasing x Q_j) 



Fig. 1. Idea of Lower Bound 



In the sequel we set g = 1 — p for p > 0 of the (T,p, e)-verifiable scheme 
and we let a£ = qa£+i for f = J— Then a£ = for £ > 1 and each 
interval T£ is by a factor 1 /q larger than the previous one. Recall that we also 
assume that a£U is an integer for all £, thus n must be a power of l/q and we 
must have I = logj^/^^n for the number / of intervals. For instance, for p = 1/2 
we have log 2 n intervals, each one half the size of the following one. 

Let h be the maximum of Q's hash function evaluations when computing Xi 
and cbi in some z-th step. Then h includes the single evaluation to derive the 
next chain value and at most h — 1 hash function computations of S. 

Lemma 1. We have C£ > log 2 T — log 2 hn — log 2 In for all £ = 
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Note that, for very small e, the latter term log 2 In becomes roughly log 2 e. 
Hence, the smaller the error should be the more check bits are requried. 

Proof. Suppose that for some interval If the number Ci is strictly less than the 
given bound. We show how to construct an adversary A then that runs at most 
T = 2* steps and succeeds with probability more than e in making the verifier 
evaluate a fraction p or more of the k = ain iterations for Xn-ain- 

Adversary A repeats the following at most r = T/hn times. A selects a 
random seed j/o and iterates the hash function until all check bits ce in interval 
have been computed. If these check bits match the original ones, then output 
X = hash"““^”(j/o) and stop, else repeat. 

Note that the adversary’s running time is certainly bounded above by T. 
This holds since the computation of the ct check bits via the position-driven 
selection algorithm in each round requires at most hn hash function iterations, 
and since the number of repetitions is at most r. 

It remains to calculate the success probability. In each loop the probability 
of A finding a value x for which the check bits match is, by Assumption 1, at 
least . Hence, the probability that A does not find a suitable x during all r 
rounds is at most: 

(l - < exp = exp ( - hu-c,^^ 

< exp ^ /m-(t-log2 /m-log2 

= exp ( - 2‘°®^ TW) = exp ( - In 
= 1-e 



The probability of A finding such an x is therefore strictly more than e, contra- 
dicting the security of the scheme. Therefore, the assumption about falling 
below the bound must be false. □ 

We immediately get from the previous lemma: 

Theorem 2. Let (f/, V,5) he a {T,e,p) -verifiable check-bit scheme with lossless 
encoding and parameter n. Let S be a position-driven selection algorithm and 
assume that Assumption 1 holds. Presume further that the computation of a 
chain of length n requires at most hn hash function evaluations. Then the length 
of the check-hit vector is at least 

( log 2 T - log 2 hn - log 2 In y^) logi/(i_p) n 

Proof. According to the lemma, for each interval If we have for the number of 
check bits: 

Ci > t — log 2 hn — log 2 In y^ 

It follows for the overall number of check bits: 

I 

^Ci>{t- log 2 hn - log 2 In y^^) log^^/^ n 
e=i 
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This proves the lower bound. □ 

For example, if n = 1,024, h = 1 and the verifier chooses a security level of 
T = 2"*° and e = 2“^°, then for p = 1/2 = q we need approximately (40 — 10 + 
20) • 10 = 500 bits. 

3.2 General Check-Bit Schemes 

For non-position-driven selection algorithms the size of the output cb^ may vary 
with the intermediate values. Luckily, we can modify the proof above to obtain 
a slightly relaxed bound. 

Take all the values ag, Xi etc. as in the previous section and let (t/, V, 5) be a 
check-bit scheme, not necessarily with a position-driven selection algorithm. Let 
Cl denote again the number of check bits in interval Xi — which now is a random 
variable over Q’s choice. In addition, fix some constant a G (0, 1). 

Lemma 2. The prohahility that Q picks a seed xq such that 

Cl > log 2 T - log 2 hn - log 2 In for all £ = I, I 

is at least 1 — 7e“. 

Substituting log 2 In by the approximation log 2 again, the success 

probability now enters as (1 — a) log 2 e. Hence, the smaller a the larger the vector 
length — but the smaller the probability of outputting such a long vector as well. 

Proof. Suppose for sake of contradiction that this probability is less than 1 — Je“. 
Then there exists a fixed £q such that the probability of G picking a seed xq such 
that 



cig < bound^o := log 2 T - log 2 hn - log 2 In 

is at least e“. 

Next, as in the previous case, we construct an adversary A trying to cause 
more than a fraction p of the work for interval Xi^ with probability more than e. A 
repeats the following r = T/hn times. A selects a random seed yo and computes 
the chain for this seed up to time period ai^n. Let x be hash"““'^o"(yp). The 
adversary continues to iterate the hash function {ai^ — ai„-i)n times. If the 
check bits do not match the given ones then repeat the process. Else return x. 

The running time of A is bounded above by T since the adversary makes at 
most hn hash function iterations for each of the r tries. As for the success prob- 
ability, condition on the event that G outputs some xq for which ci„ < bound^o. 
This happens with proabability at least e“. Next note that the adversary suc- 
ceeds if the at most bound^g bits of the attempt match. This happens with 
probability at least according to Assumption 1. 

Hence, under the condition that G's seed Xq causes ci^ to be less than the 
bound, it follows as before that A fails with probability 

I'l _ 2-boundfgy gl-a 
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The probability that A succeeds in the experiment is therefore more than 
times the probability that < bound^g for ^’s output. Multiplying these two 
probabilities we obtain a successful attack with probability more than e. Thus 
the initial assumption must have been wrong. □ 

Theorem 3. Let (Q,V,S) he a {T,e,p) -verifiable check-hit scheme with lossless 
encoding and parameter n. Presume that the computation of a chain of length n 
requires at most hn hash function evaluations and let a € (0, 1) be a constant. 
Then, under Assumption 1, with probability at least 1 — e“ log]^^(]^_p) n (over G’s 
seed choice) the check-bit vector has at least 

( log 2 T - log 2 hn - log 2 In logi/(i_p) n 

bits. 

4 Constructions of Check-Bit Schemes 

In this section we present our check-bit scheme. We start with an elementary 
attempt which provides an absolute work bound of w = log 2 T -\- log 2 ^ hash 
function evaluations for the desired security parameter. However, the relative 
performance (relative to the time period and the original number of hash function 
evaluations) is rather bad, so we elaborate on a construction with relative bound 
p = (log 2 T -I- log 2 -)/100. This, unfortunately, comes with an increase in the 
length of check-bit vector. 

4.1 Construction with Absolute Bound 

In our construction with absolute bound the selection algorithm S simply out- 
puts the least siginficant bit of intermediate value Xn-i for each percent of com- 
putation (i.e., if i = [jn/lOOj for some j). Here, the value 100 is chosen rather 
arbitrarily; any other granularity may be selected as well. The verifier, when 
checking some input x,i, then merely compares the least significant bits of the 
intermediate values when re-calculating the chain, and stops if a mismatch oc- 
curs. 

Construction 4. The check-hit scheme {Gabs,Vabs,Sabs) with parameter n > 
100 is described by the following selection algorithm: 

Algorithm Sabs(x,i): 

'll * = some j G {1, . . . , 100} 

then output cb^ = [least signifcant bit of x] 
else output cb^ = A 

Note that the length of the check-bit vector is constant and adds 100 bits to 
the public chain’s end value of typically 160 or 256 bits. 

The idea of the scheme is as follows. Suppose that the distribution of the bits 
is approximately uniform, and that the adversary cannot do better than com- 
puting chains for randomly chosen seeds. Then, for such a seed the probability 
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of hitting w = log2 T + log2 ^ of the given check bits is at most 2“*" = eT~^. 
Hence, the overall success probability of the adversary making T or less steps is 
at most e. 

The scheme, as is, does not provide a reasonable relative security level, 
though. For instance, consider the time period k which is log2 ^ — 1 percent 
from the end value n. An adversary that outputs a random x together with k 
makes V evaluate the whole hash function till the end with probability 2e (be- 
cause there are at most log2 7 — 1 check bits in this interval). Hence, for any 
given e the verifier performs 100% of the original computation with probability 
more than e for some point k. Otherwise the length of the vector could not go 
below our lower bound. 

Because of our interest in relative bounds we omit a formal security statement 
and analysis of this scheme here and turn to the next construction instead. 



4.2 Construction with Relative Bound 

The problem with the approach in the previous subsection is that the check bits 
are distributed equidistantly over the chain of length n. Yet, the workload of 
the verifier varies with the distance to the end value and is thus relative to the 
position. The idea is now to increase the density of check bits towards the end of 
the chain such that the number of check bits compensates for the reduced work 
towards the first time periods. 

We partition the chain of length n into / = log2 n intervals of length 
1,2,4, ... , nlA,nl2. For ease of notation we presume that n is a power of 2. 
For ^ = 1, . . . , J interval X(, ranges from 2^“^ -I- 1 to 2^. In interval Xi we let 5rei 
output the least significant bit of the intermediate values at positions jn/100. 
Again, any other base instead of 100 may be chosen. In interval I/_i we double 
the check bits by outputting the bits of each value at position jn/200. In general, 
we output the least significant bit of value Xn-i for i & Ii iii = jn/(100 • 2^“^). 

Another refinement is to return the b least significant bits instead of a single 
one only. This improves the error detection probability. We thus define our check- 
bit scheme with respect to a parameter b which can be an arbitrary integer but 
which is fixed for a specific instance. 

Construction 5. The check-bit scheme {Qrei,b, Vrei,b, Srei,b) with parameter n > 
100 is described by the following selection algorithm: 

Algorithm Srei,b{x,i): 

if ieXi and i = for some j G {[^J + 1, • ■ • , |"^]} 

then output cb^ = [b least signifcant bits of x] 
else output cb^ = A 

For each interval Xi the variable j runs through 50 values, for each such 
values producing b bits output. Hence, the overall length of a check-bit vector is 
given by: 



506 • I = 506 • log 2 n 
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If we choose 6 = 2, for instance, then we get a check bit vector of 100 log 2 n bits, 
and for n = 1, 024 the check-bit vector is 1, 000 bits. 

In order to show security we first need to specify the assumption about the 
hash function, or more precisely, about the bits we output: 

Assumption 6. For any two seeds xq yf yo we assume that the check-hit vectors 
cb(xo) and cb(yo) generated by Srei,b are uniformly and independently distributed 
strings of the corresponding length (where the probability is over the choice of 
the hash function hashj. 

This assumption is (almost) satisfied if hash is for example modelled as a 
random oracle [2]. In this case, the bits are uniformly and independently dis- 
tributed as long as no intermediate collisions occur. Such collisions are, however, 
very unlikely and happen only with negligible probability. 

Also note how this assumption captures the adaptive queries of the adversary 
to Release(-) in experiment Exp_ 4 (T, e,p). Specifically, the assumption quantifies 
over all seeds and thus, even if the adversary knows a seed Xq generated by Grei,b, 
it is infeasible to find another seed complying with (parts of) cb(xo) better than 
with trial-and-error. Also, even if given pre-images of Xn and the check-bit vector 
cb(xo), it remains infeasible to find another preceding pre-image. 

From a practical point of view, well-known hash functions like SHA-1 and 
RIPEMD-160 seem to approximate this assumption quite well. To best of our 
knowledge the distribution of the least significant bits is not known to be biased 
significantly. Similarly, providing very few bits of a pre-image is not known to 
substantially help inverting the hash function. See [1] for results. 

Theorem 7. Under Assumption 6 the check-hit scheme (Grei,b,Vrei,b,Srei,b) in 
Construction 5 constitutes a p-verifiahle check-hit scheme for 

KlogaT-f log2 i) 

For chains of length n the scheme generates check-hit vectors of length 506-log2 n. 

Proof. Assume that the adversary’s final output is a pair (x,k). Let £ be the 
interval number in which k lies, i.e., 

n n 

2 • 21 -^ ^ ^ 

Then, one precent of the work to verify the pair (x, k) corresponds to at least 
i percent of the work to verify the whole chain. By construction, on the 

other hand, we have 62^“* > 62^“^ check bits in each interval Xi for i < £. Hence, 
if we perform lOOp percent of the verification work for (x, k), then we consult at 
least lOOp • i • 27 ^ • 62^“^ = 1006p/2 check bits in total. 

By the bound on the running time the adversary can probe at most T values 
{x,i) during the experimental phase. The probability that a specific of these 
values matches the first c given check bits cb>i is by assumption at most 2“°. 
Hence, the probability that any of the at most T samples matches those bits is 
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bounded above by T • 2 Together with the fact above the probability of the 
adversary finding a value matching 1006p/2 check bits is at most 

rp ^ 2“100&p/2 _ rp _ 2 ~ log2 = g 

The length of the check-bit vector has already been discussed above. □ 

Returning to our example with n = 1, 024 and b = 2, for T = 2“^^ and 
e = 2“^° the verifier requires about p = 60% of the original workload. For 6 = 3 
and vectors of 1, 500 bits the work in this case even reduces to 40%. 

5 Discussion 

We have presented constructions to improve the verification time of hash chains. 
Our solutions enable the verifier to select a flexible security level and to relate the 
work to be done to this security level. Our constructions and lower bounds rely on 
so-called check-bit schemes where basically some bits of the intermediate values 
are output. Fortunately, such schemes are very simple and can be integrated 
quite easily; they preserve the simplicity of hash chains and are applicable in 
general. Disadvantageously, as we have shown, those schemes cannot go below 
certain bounds when it comes to the length of the check-bit vectors. 

It remains an open problem to provide other check-bit schemes with shorter 
vectors, e.g., by using lossy encoding techniques. Yet, those schemes should have 
comparable simplicity as the basic scheme in this paper, otherwise the running 
time may be dominated by the additional effort, invalidating the benefits of 
faster verification. Similarly, it would be interesting to show lower bounds for 
more general check-bit schemes. 
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Abstract. A drawback of visual cryptography schemes (VCS) is much 
loss of contrast in the reconstructed image. This paper shows that no 
loss of contrast can be almost achieved if we are allowed to use a very 
simple non-cryptographic operation, reversing black and white. Many 
copy machines have this function these days. Therefore, our VCS is very 
attractive. 

Keywords: Visual cryptography, ideal contrast, perfect black 



1 Introduction 

Visual cryptography schemes (VCS) were introduced by Naor and Shamir [9] and 
have been studied by many researchers [1,3,4,5,7,11]. A (fc,n)-VCS is a method 
to encode a secret image I into n transparencies, where each participant receives 
one transparency. In the reconstruction phase, any k participants can recover 
the secret image by superimposing their transparencies. However, any k — 1 
participants have no information on I. This can be done without any knowledge 
of cryptography and without performing any cryptographic operations. 

A drawback of these schemes is much loss of contrast in the reconstructed 
image. In a (2,2)-VCS [9], a black pixel is translated into a black region but a 
white pixel is translated into a grey region (half black and half white) . 

On the other hand, Blundo et al. showed how to construct a perfect black 
{k, n)-VCS for any 2 < k < n, where the reconstruction of black region is perfect 
[6,3]. However, the reconstruction of white region is very dark. 

In this paper, we show that no loss of contrast can be almost achieved if we 
are allowed to use a very simple non-cryptographic operation, reversing black 
and white. That is, all the black region is reversed to white and all the white 
region is reversed to black. Many copy machines have this function these days. 
Therefore, our VCS is very attractive. We call our construction a {k,n)-VCS 
with reversing. 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 353-365, 2004. 
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We first show a perfect black (/c,n)-VCS with reversing such that the re- 
construction of white region is almost perfect. This means that the contrast is 
almost ideal. The cost we have to pay is the size of shares. If the size of shares is c 
times larger, then the grey level of white region converges to zero exponentially. 

We next show how to convert a perfect black (/c,n)-VCS (with reversing) to 
a perfect white {k,n)-YCS with reversing. Perfect white VCSs are much more 
preferable than perfect black VCSs because the white region is much larger than 
the black region in usual images. From our first result, we can obtain a perfect 
white {k,n)-YCS with reversing such that the reconstruction of black region is 
almost perfect. 

We finally show a perfect black VCS for any monotone access structure. 
This means that we can obtain a VCS with reversing for any monotone access 
structure such that the contrast is almost ideal. (Perfect black VCSs have been 
known only for (fc, n) -threshold cases so far.) 

It will be a further work to find another simple non-cryptographic operation 
which can achieve almost ideal contrast. 

Related work: Naor and Shamir showed an improved scheme in [10]. However, 
it works only for (2,2)-VCS. 

2 Preliminaries 

For a random variable X, E[X] denotes the expected value and Var[X] denotes 
the variance. We sometimes use -I- to express OR. 

2.1 Model 

A (fc,n)-visual cryptography scheme (VCS) consists of a distribution phase and 
a reconstruction phase. Let / be a secret image which consists of black and white 
pixels P. 

In the distribution phase, a dealer T> encodes each pixel P into n shares 
Si, • • • , s„, one for each transparency. T> then gives Sj to participant Pi for i = 

!,■■■ ,n. 

In the reconstruction phase, any k participants Pi^ , • • • , Pi^ reconstruct I by 
superimposing their transparencies. That is, the reconstructed pixel is given by 

P = Si-^ + Si^ + ■ ■ ■ + Si ^, , 

where -I- means OR. However, any k — 1 participants have no information on /. 

Each Si consists of m sub-pixels, where m is called the expansion rate. Hence 
Si is described by a Boolean vector of length m 

(0,lj * * * 5 

where Cij = 1 if the j-th sub-pixel in st is black. Let C = [cij] be the n x m 
Boolean matrix which consists of , • • • , We say that C is the encoding matrix 
of P. 
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Usually, the dealer T> computes the encoding matrix C of a pixel P from 
two matrices Mg and Mi as follows: C is obtained by randomly permuting the 
columns of Mg if P is white and by randomly permuting the columns of Mi if 
P is black. Mg and Mi are called the basis matrices. 

P is interpreted as black if wh{P) is large, and as white if wh{P) is small, 
where wh{P) denotes the Hamming weight of P. We define the grey level of a 
pixel P as 

GREY(P) = WH{P)/m, 

where P = white or black. 

Therefore, GREY (white) should be close to zero and GREY(5/acfc) should be 
close to one. The contrast is ideal if 

GREY (white) = 0 and GREY (black) = 1. 



2.2 Naor-Shamir (2,2)-VCS 

Naor and Shamir showed the first (fc,n)-VCS [9]. Fig 1 illustrates their (2,2)- 
VCS. 

In the distribution phase, each pixel P is split into two sub-pixels in each of 
the two shares si and S 2 - If U is white, then the dealer T> randomly chooses one 
of the first two rows of Fig 1. If P is black, then T> randomly chooses one of the 
last two rows of Fig 1. T) then gives si to participant Pi and S 2 to participant 

P2. 

In other words, the basis matrices are 



Mg = 





( 1 ) 



The dealer T> computes the encoding matrix (7 of a pixel P by randomly per- 
muting the columns of Mg if P is white and by randomly permuting the columns 
of Ml if P is black. 

In the reconstruction phase, the two participants superimpose Si and S 2 - If 
P is black, then they get two black sub-pixels; if P is white, then they get one 
black sub-pixel and one white sub-pixel. Therefore, 



GREY(black) = 1, GREY(white) = 1/2. 



2.3 Perfect Black VCS 

We say that a (k, n)-VCS is perfect black if 

GREY(Wacfc) = 1 and GREY (white) < 1. 

The (n, n)-VCS shown by Naor and Shamir [9] is perfect black. The expansion 
rate is m = 2"“^ and they showed that it is optimum. 

For any 2 < k < n, Blundo et al. showed a perfect black (k, n)-VCS such 
that 

GREY(white) = 1 — 1/m 
for some expansion rate m [6]. 
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3 Basic Construction 

In this section, we show a basic construction of our schemes. We present a 
perfect black (2,2)-VCS with reversing such that GREY (white) = 1/4. Since 
GREY (white) = 1/2 in the Naor-Shamir (2,2)-VCS, the contrast is improved. 

Definition 1. We say that an image I is reversed if all black pixels are reversed 
into white and all white pixels are reversed into black. 

Let P denote the reversed pixel of P. Our scheme is illustrated in Fig 2 and 
Fig 3. 

(Distribution phase) A dealer T> runs the distribution phase of Naor-Shamir 
(2,2)-VCS twice independently. Let (si,S 2 ) be the shares of the first run and 
(sj, s' 2 ) be the shares of the second run. Then the share of participant Pi of our 
VCS is (si,s() and that of participant P 2 is ( 52 , 5 ^- See Fig 2. 



(Reconstruction phase) 

Step 1. Two participants superimpose si, S 2 and obtain T = si -I- S 2 - Similarly, 
they superimpose s( , s '2 and obtain T' = -|- s '2 ■ They are illustrated in the 

last columns of Fig 2(a) and Fig 2(b). 

Step 2. They next reverse T, T' and obtain T and T' as shown in Fig 3. 

Consider a pixel P. 

— If P is black, then T and T' all black. Therefore, T and T' are all white. 

— If P is white, then T and T' are grey such that a half region is black and 
the other half is white in each one of the four cases. Therefore, T and T' are 
also grey such that a half region is white and the other half is black in each 
one of the four cases. 

Step 3. The two participants superimpose T, T' and obtain T + T' . 



pixel P 




Si S2 


Si + S2 




p — .5 


E B 


E 




p — .5 


a a 


[a 


■ 


p — .5 


E a 






p — .5 


a B 





Fig. 1. Naor-Shamir 2-out-of-2 visual cryptography scheme 
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(a) First run 
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(b) Second run 
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Fig. 2. Proposed (2, 2)-VCS (1) 



From Fig 3, we can see that: 

— If P is black, then T + T' is always white. 

— If P is white, then T + T' is black with probability 1/2 and grey (half black 
and half white) with probability 1/2. This is because (si, S 2 ) and (s^, are 
generated independently and randomly. 



Step 4. Finally the two participants reverse T + T' and obtain T + T' . 

It is clear that: 

— If P is black, then T + P' is always black. 

— If P is white, then T + T' is all white with probability 1/2 and it is grey 
(half black and half white) with probability 1/2. 
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Fig. 3. Proposed (2,2)-VCS (2) 



(Contrast): In our (2,2)-VCS with reversing, we obtain that GREY (black) = 1 
and 

E[GREY (white)] = (1/2) x 0 + (1/2) x (1/2) = 1/4. 

4 General Construction: (k, n)-Threshold Case 

In this section, we show a general construction of our (k, n)-VCS with reversing. 
The reconstruction of black region is perfect and the reconstruction of white 
region is almost perfect. 

The cost we have to pay is the size of shares. If the size of shares is c times 
larger, then the grey level of white region converges to zero exponentially. 

4.1 c-Run (fc,n)-VCS with Reversing 

Suppose that there exists a perfect black (fc,n)-VCS. We then construct a “c- 
run (/c,n)-VCS with reversing'' as follows. (Remember that there exists a perfect 
black (fc, n)-VCS for any 2 < k < n.) 

Let P a secret pixel to be distributed. 

(Distribution phase) In the distribution phase, the dealer V runs the dis- 
tribution phase of the perfect black (k, n)-VCS c times independently. Let 
(si_i, • • • , Sn^i) be the set of shares in the i-th run for i = 1, • • • , c. The share 
of participant Vj of our VCS is then (s^p, • • • , Sj,c)- 

(Reconstruction phase) Any k participants, say , • • • , Pj^ , reconstruct P 
as follow. 

1. For i = 1, • • • , c, they superimpose their shares and obtain 

Ti = 
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2. They reverse Ti and obtain Ti for i = 1, • • • , c. 

3. They superimpose Ti, • • • , Tc and obtain U = Ti + • • • + Tc- 

4. We reverse U and obtain P, where 

P=U = Ti + • • • + T„ 

On the other hand, it is clear that any k—1 participants have no information 
on P from the property of the original (fc,n)-VCS. 



4.2 Contrast 

We show that the contrast is almost ideal in our construction. It is easy to see 
that GREY (black) = 1 because the original VCS is perfect black. We now show 
that both E[GREY (white)] and Var[GREY(ru/izte)] converge to zero. 

Theorem 1. Suppose that GREY (white) = q < 1 in the original perfect black 
VCS. Then in our c-run VCS with reversing, 

(1) E[GREY (white)] = qV 

(2) Var[GREY('u;/izte)] < — q‘^). 



Proof. (1) Let P be a white pixel. Each p is described by a Boolean vector of 
length m 

(^i,l: * * * ) ^i,m)-> 

where m is the expansion rate. Similarly, the reconstructed pixel P is described 
by a Boolean vector 

W = (wi, ■ ■ ■ ,Wm)- 



Now since 



Wj = oij H + Oc,j, 



it holds that 



Wj = Oij X ■ ■ ■ X OcJ 



from De Morgan’s law. Therefore, 



E[wh(W)] = E[Y^ Wj] = E(wj) = Y X • • • X ac,j] 

3 3 3 

= ^Pr(aij = ••• = ac,j = 1) 

3 

— = 1) X • • • X Fr{acj = 1) 

3 

3 

= mq'^. 



Consequently, P[GREY('u;/izte)] = E[wH(W)]/m = q^. 
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(2) It is easy to see that (wi + • • • + Wm) < m because wj = 0 or Wj = 1. 
Therefore, 



m 

(wi H + Wm)“^ < m{wi H + Wm) = m ^ Wj 



Hence 



\sx[wh{W)] = E[wH{Wf] - E[wh{W)Y = E\C^Wjf] - 

3 

< mif Wj] — 

3 

= mE[wH{W)] — w)q^'^ 

= 

Consequently, Var[GREY(w/itte)] = Var[rt;//(TT)]/m^ < <7^(1 — g°). 

□ 



Therefore, 



lim if[GREY('u;/itte)] = 0 and lim Var[GREY(w/itte)] = 0. 

c— >-00 C—^OQ 

This means that we can obtain asymptotically ideal contrast by letting c large. 



4.3 Complexity 

The reconstruction phase of the c-run (fc,n)-VCS with reversing requires c + 1 
reversing operations and superimposing kc—1 transparencies. The size of shares 
become c times larger than that of the original VCS. 



4.4 Corollaries 

From the previous result [6], we obtain the following corollary. 

Corollary 1. For any 2 < k < n, there exists a perfect {k,n)-VCS with revers- 
ing such that 

E[GR£y {white)] = (1 - 1/m)" 

Var[GREY(w/itte)] < (1 — 1/m)" {1 — (1 — 1/m)"} 

for some expansion rate m, where c is any positive integer. 

If we use the Naor-Shamir (2, 2)-VCS, we obtain the following corollary. 
Corollary 2. There exists a perfect black {2, 2) -VCS with reversing such that 

E[GR£y {white)] = (1/2)= 

Var[GREY(u>/rite)] < (1/2)={1 - (1/2)=} 

with the expansion rate m = 2, where c is any positive integer. 
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As an example, we present a 3-Run (2, 2)-VCS. 

(Distribution phase) The dealer V runs the distribution phase of Naor-Shamir 
(2,2)-VCS three times independently. Let (si,S 2 ) be the shares of the first run, 
(s'l, S 2 ) be the shares of the second run and (s", S 2 ) be the set of shares of the 
third run. 

Then the share of participant Vi is then (si, s(, s"). and that of participant 
V 2 is (s2,S2>S2)- 

(Reconstruction phase) 

1. We superimpose si and S2, and then obtain T = si -I-S2. Similarly, we obtain 
T' = -h s '2 and T" = s" -h S 2 '. 

2. We reverse T,T' and T", and obtain T,T' and T" . 

3. We superimpose T, T', T" and obtain U = T + T' + T" . 

4. We reverse U and obtain P. 

(Contrast): We can then see that GREY (black) = 1 and 

E[GREY (white)] = (1/4) x (1/2) -h (3/4) x 0 = 1/8. 

5 Perfect White VCS 

5.1 Conversion from Perfect Black VCS 

We say that a (k, n)-VCS is perfect white if 

GREY (white) = 0 and GREY (black) > 0. 

In usual pictures, the white region is much larger than the black region. There- 
fore, perfect white VCSs are much preferable than perfect blackYCSs. However, 
no perfect white VCS has been known. 

In this section, we show that a perfect white (fc,n)-VCS with reversing is 
easily obtained from a perfect black (fc,n)-VCS (with reversing). 

Theorem 2. Suppose that there exists a perfect black (k,n)-VCS such that 
ElGREY (white)] = p. Then there exists a perfect white (k,n)-VCS such that 
E[GREY (black)] = 1 - p. 

Proof We describe a perfect white (k,n)-YCS. 

In the distribution phase, 

1. the dealer T> first reverses the original image I and obtains I. 

2. V then applies the distribution phase of the perfect black (fc,n)-VCS to I. 



In the reconstruction phase 
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1. a qualified subset of participants apply the reconstruction phase of the per- 
fect black (fc,n)-VCS and obtains a reconstructed image I . 

2. They finally reverse / and obtain / . 

Then it is easy to see that the above scheme is a perfect white (/c,n)-VCS 
such that E [GREY (black)] = 1 — p. 

□ 

As the original perfect black (fc,n)-VCS with reversing, we can use our con- 
struction shown in the previous section. 



5.2 Example 

As an example, we show how to convert the perfect black (2,2)-VCS of Sec. 2. 2 
into a perfect white (2,2)-VCS with reversing. (See Fig 4.) 

In the distribution phase, 

1. the dealer T> first reverses the original image I. Hence each white pixel is 
reversed into black and each back pixel is reversed into white. 

2. T> then applies the distribution phase of the perfect black (2, 2)-VCS. Par- 
ticipant Vi obtains a share si and participant V 2 obtains a share S 2 - 

In the reconstruction phase, 

1. the two participants superimpose si and S 2 and obtains si -I- S 2 - 

2. They finally reverse si -I- S 2 and obtain si -|- S 2 - 

From Fig 4, we see that a perfect white (2,2)-VCS is obtained such that 
GREY(Wacfc) = 1/2. 
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Fig. 4. Perfect white 2-out-of-2 visual cryptography scheme 
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6 Perfect Black VCS for General Access Structure 

Perfect black VCSs have been known only for (fc, n)-threshold cases so far al- 
though VCS itself can be constructed for general access structures [1], In this 
section, we show a perfect black VCS for any monotone access structure. This 
means that we can obtain a VCS with reversing for any monotone access struc- 
ture such that the contrast is almost ideal. 

Let {Mnfi, M„_i) be the basis matrices of a perfect black (n, n)-VCS of Naor 
and Shamir [9]. Let 







i i 




Mn,0 — 


\ ^n,n / 


II 


u 



where e„y and j are binary vectors of length 2" ^ . 

6.1 Access Structure 

Let P = n} be a set of participants. (A:, n)-threshold secret sharing 

schemes are generalized to secret sharing schemes with monotone access struc- 
tures r [8,2], where T is a set of all subsets of participants which can determine 
the secret. 

In a secret sharing scheme, on input a secret s, a dealer P computes 
(vi, . . . , Vn) and gives Vi to participant i so that only qualified subsets of partic- 
ipants (access subset) can recover the secret. Let 

T = {A C P I A can determine s}. 

Then P is called an access structure and A € T is called an access set. Let 
Fq = {A C P I A is a minimal access set.}. 

Definition 2. T is said to be monotone if 

Ae r, AC A' ^ A' e r. 

We require that any B ^ F has no information on s. Then it is known that 
a secret sharing scheme for T is exists if and only if F is monotone. 

6.2 General Construction 

We show a pair of basis matrices {Lq, Li) of a perfect black VCS for any mono- 
tone access structure /q. For Fq, there exists a usual type secret sharing scheme 
as follows. Suppose that Fq = {Ai,---,A(}, where Aj = {ji, - " ^j\Aj\}- For 
1 < j < t, the dealer P chooses random bits such that 



s — bj^i © • • • © bj^\Aj\- 
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and gives bj^u to participant j„. 

Now let G = (gij) be an intermediate dummy n x t matrix. 

(Construction of Lq): li i = ju for some 1 < m < \Aj\, then replace with 
e\Aj\,u- Else, replace with (1, • • • , 1). Then we obtain Lq. 

(Construction of Li): If t for some 1 < m < \Aj\, then replace gij with 
ej^ I Else, replace gtj with (1, • • • , 1). Then we obtain Li. 

In this construction, the expansion rate is m = and 

GREY (white) = 1 /m. 

We show an example for Fq = {{1, 2}, {2, 3, 4}} below. 

/lOllllX /lOllllX 

100011 _ 010011 
110101 110101 

yiioiio/ yiiiooi/ 

This is a perfect black VCS for Eg. The expansion rate is m = 2^“^ + = 6 

and GREY (white) = 1/m = 1/6. 



6.3 Construction for Special Cases 

In this subsection, we present a better construction for Eg = 
{{1, 2}, {2, 3}{3,4}}. For Eg, there exists a usual type secret sharing scheme 
as follows. Let {0, 1} be the set of secrets. The dealer T> chooses random bits 
6i , • • • , 64 such that 

s = &i © 62 = &3 ® ^4 

The set of shares are Vi = 64, V2 = 627 "^3 = (bi, ^3), V4 = 64. 

Now as an intermediate dummy matrix, let 

^ bi, x\ 

b2, X 

bi,bs 

\x, 64/ 



(Construction of Eg): In G, replace &i and 63 with 62,4. Replace 62 and 64 
with 62.2- Replace x with (1, 1). 

(Construction of E 4 ): In G, replace bi and 63 with 4 . Replace 62 and 64 
with 62,2- Replace x with (1, 1). 

Then we obtain (Lq,Li) as follows. 

/ 1011 \ / 1011 \ 

1011 _ 0111 

“ 1010 ’ “ 1010 

\ 1110 / 
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The expansion rate is m = 4 and GREY (white) = 1/4. If we use the general 
construction, then the expansion rate is m = 6 and GREY (white) = 1/6. 

We can apply the same technique to Iq = {{1, 2}, {1, 3}, {2, 3,4}}, but not to 
To = {{1,2},{2,3},{3,4},{2,4}} nor Tq = {{1, 2}, {1, 3}, {1, 4}, {2, 3, 4}}. The 
details will be given in the final paper. 
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Abstract. We demonstrate that some finite fields, including F2210, are 
weak for elliptic curve cryptography in the sense that any instance of 
the elliptic curve discrete logarithm problem for any elliptic curve over 
these fields can be solved in significantly less time than it takes Pollard’s 
rho method to solve the hardest instances. We discuss the implications 
of our observations to elliptic curve cryptography, and list some open 
problems. 



1 Introduction 

Elliptic curve cryptography (ECC) is being standardized by accredited standards 
organizations and governments around the world. The security of elliptic curve 
systems is based on the hardness of the elliptic curve discrete logarithm problem 
(ECDLP): given an elliptic curve E defined over a finite field Fg, a point P G 
E(Fg) of order r, and a second point Q G (P), determine the integer I G [0,r — 
1] such that Q = IP. Elliptic curve systems are especially attractive because 
Pollard’s rho method [34], the best algorithm known for the solving the general 
ECDLP, has a fully-exponential expected running time of i/7rr/2 point additions. 

For a given underlying field Fq, maximum resistance to Pollard’s rho method 
can be attained by selecting an elliptic curve E for which r is prime and is as 
large as possible. The most favourable situation arises when ^E{¥q) is prime 
or almost prime, i.e., =f^E(¥q) = dr, where r is prime and the co-factor d is 
small (e.g., d G {1,2, 3, 4}). In this case, since #E(Fq) lies in the Basse interval 
[{,/q — 1)^, -I- 1)^], we have r q and we say that the elliptic curve has a 

security level of | log2 q bits. 

Some ECC standards recommend or mandate a small selection of finite fields 
and elliptic curves. Among these, the most influential has been the FIPS 186-2 
standard [8] for the elliptic curve digital signature algorithm (ECDSA) which 
recommends five prime fields Fp for specified primes p of bitlengths 192, 224, 
256, 384, and 512, and the five characteristic two finite fields F2163, F2233, F2283, 
F2409, and F2571 . The recommended elliptic curves over these fields have security 
levels of approximately 80, 112, 128, 192, and 256 bits, which match the security 
levels of the SKIPJACK, Triple-DES, AES-Small, AES-Medium, and AES-Large 
symmetric-key encryption schemes. Fixing a small set of allowable fields has the 
advantages of facilitating interoperability, and permitting the optimization of 

T. Okamoto (Ed.): CT-RSA 2004, LNCS 2964, pp. 366-386, 2004. 
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hardware and software implementations by exploiting properties of the chosen 
fields. 

It is therefore reasonable to expect that commercial deployments of ECC 
will converge upon a small selection of finite fields. This does not appear to 
be a serious limitation for the following reasons. First, there are an enormous 
number of elliptic curves to choose from; more precisely, there are roughly 2q 
isomorphism classes of elliptic curves over Fg. Second, the orders of these curves 
are roughly uniformly distributed over the Hasse interval in the case of prime 
fields, and over the even integers in the Hasse interval in the case of characteristic 
two finite fields. Consequently, elliptic curves of almost prime orders are plentiful 
and can be easily found. Finally, there are very few elliptic curves of almost prime 
order over a field Fg for which the ECDLP can be solved in subexponential (or 
faster) time — those that succumb to the Weil and Tate pairing attacks [12,29], 
and the attack on prime-field anomalous curves [35,36,38]. It is easy to recognize 
these curves, and thus the aforementioned attacks can readily be circumvented. 

Nonetheless, the possibility still remains that algorithms will subsequently 
be discovered for efficiently solving any instance of the ECDLP for any elliptic 
curve over a selected field. If ECC solutions employing that field were widely 
deployed (especially in hardware), then the consequences of such a discovery 
would be more drastic than if an attack were discovered on a special class of 
curves because a change in the underlying field would be required. Determining 
whether such finite fields exist is therefore an important problem in elliptic curve 
cryptography. 

Definition 1. A finite field F^ is said to be bad for elliptic curve cryptography 
if the following conditions are satisfied: 

1. for some elliptic curves E overWg, solving the ECDLP in E(¥g) using Pol- 
lard’s rho method (and its parallelized versions [31]) is intractable using ex- 
isting computer technology; and 

2. algorithms are known that can feasibly solve (using existing computer tech- 
nology) any ECDLP instance for any elliptic curve over Fg. 

No bad fields for ECC are presently known. The contribution of this paper 
is the observation that some finite fields are weak in the following sense. 



Definition 2. A finite field F^ is said to be weak for elliptic curve cryptography 
if the following conditions are satisfied: 

1. for some elliptic curves E over Fg, solving the ECDLP in E(Fg) using Pol- 
lard’s rho method (and its parallelized versions [31]) is intractable using ex- 
isting computer technology; and 

2. algorithms are known for which any ECDLP instance for any elliptic curve 
over Fq can be solved in significantly less time than it takes Pollard’s rho 
method to solve the hardest ECDLP instances over Fg. 
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While the ECDLP for elliptic curves over a weak field may in fact be in- 
tractable in general, demonstrating that a field is weak provides some evidence 
that the field may be bad, and therefore unsuitable for elliptic curve cryptogra- 
phy. 

Of course our definition of a weak field is not precise since “significantly less” 
has not been quantified. We remark that the discovery [16,45] of a -\/]V-speedup 
of Pollard’s rho method for solving the ECDLP in the group of F2N -rational 
points on a Koblitz curve ^ caused some to view the security of these curves with 
suspicion. For Koblitz curves over F2163 and F2283, the speedup is by a factor 
of only 13 and 17, respectively. In this paper, we present reasonable arguments 
that the finite fields F2 at, where N G [185, 600] is divisible by 5, are weak fields 
for ECC. In particular, we show that the ECDLP for all elliptic curves over 
F2210 (respectively, one-quarter of all elliptic curves over F2210) can be solved 2^^ 
times faster (respectively, times faster) than it takes Pollard’s rho method 
to solve the hardest instances. These speedups are significantly greater than the 
aforementioned speedups for Koblitz curves, and moreover are applicable to all 
(respectively, one-quarter of all) elliptic curves over F2210. While upto now it 
was believed that an elliptic curve over F2210 whose group order is twice a prime 
offers a security level of 104 bits, our results show it can have a security level 
of at most 91 bits, that is, the same as a curve over F2183 is able to offer. The 
field F2210 is interesting because its arithmetic can be efficiently implemented by 
successive extensions, e.g., F22 C F26 C F230 C F2210. As another example, we 
show that the ECDLP for all elliptic curves over F2600 can be solved about 2®® 
times faster than it takes Pollard’s rho method to solve the hardest instances. 
Hence an elliptic curve over F2600 can have a security level of at most 230 bits. 

Organization. The remainder of this paper is organized as follows. In Section 2, 
we summarize the recent work on the Weil descent attack on the ECDLP. Our 
detailed arguments that the fields F2JV, where N G [185, 600] is divisible by 5, are 
weak are presented in Section 3. In Section 4, we examine the fields F2N, where 
JV is divisible by 4, for weakness. In Section 5, we further explore the special 
case JV = 210. We draw our conclusions in Section 6 and list some interesting 
open problems. 

2 Weil Descent Attack on the ECDLP 

Frey [11] first proposed using Weil descent as a means to reduce the ECDLP 
in elliptic curves over finite fields F^n to the discrete logarithm problem in the 
Jacobian variety of a curve of larger genus over the proper subfield F^. If a 
subexponential-time algorithm is known for the DLP for the resulting curve, 
then this could lead to an algorithm that solves the original ECDLP instance 
faster than Pollard’s rho method. 

^ A Koblitz curve is an elliptic curve defined over F2. There are two such curves: 
y^+xy = x^ + 1 and y^+xy = x^+x^ + 1 . These curves admit fast point multiplication 
algorithms (see [40]) and are therefore favoured over other curves defined over F2JV. 
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Let I and n be positive integers, and let N = In. Let q = 2\ and let k = ¥g 
and K = F^n. Consider the (non-supersingular) elliptic curve E defined over K 
by the equation 

E : y'^ + xy = + ax'^ + b, a £ K, b £ K* . 

We assume that ^E{K) = dr where d is small and r is prime, whence r k, q^. 
Let bi = (j'^{b), where a : K ^ K is the Frobenius automorphism defined by 
a I— a*. The magic number for E relative to n is defined to be 

rn = m(5) =dimF2(Spanj,J(l,6y^), (1,6 }/^),... ,(l,&y_\)}). (1) 

Assume now that either n is odd, or m(6) = n, or = 0- Gaudry, Hess 

and Smart [18] showed how Weil descent can be used to reduce instances of the 
ECDLP in the subgroup of order r of E{K) to instances of the hyperelliptic 
curve discrete logarithm problem (HCDLP) in a subgroup of order r of the 
Jacobian Jc{k) of a hyperelliptic curve C of genus g = 2™“^ — 1 or 2™“^ defined 
over k. One first constructs the Weil restriction WEjk of scalars of E, which is 
an n-dimensional abelian variety over k. Then, WEjk is intersected with n — 1 
hyperplanes to eventually obtain the hyperelliptic curve C from an irreducible 
reduced component in the intersection. The reduction algorithm, together with 
the fastest known algorithm for solving the HCDLP in Jc{k), is called the GHS 
attack on the ECDLP. 

Since subexponential-time algorithms are known for the HCDLP for large 
genus hyperelliptic curves [1], it is possible that the GHS attack can solve the 
original ECDLP instance faster than Pollard’s rho method. In [30], it was shown 
that for all elliptic curves over F2N where N £ [160, 600] is prime, the genus g 
of C is either too small (whereby the attack fails because Jc(F2) is too small 
to yield any non-trivial information about the ECDLP in E(F2 n)), or is too 
large {g > 2^® — 1, whereby the attack fails because the HCDLP in Jc( 1F2) is 
intractable). In [23], the GHS attack was used to solve an instance of the ECDLP 
over F2124 (which is infeasible to solve using Pollard’s rho method) by reducing 
it to an instance of the HCDLP in a genus 31 hyperelliptic curve over F24 and 
solving the latter using the Enge-Gaudry algorithm [17,7]. A convincing argu- 
ment was presented that the GHS attack could also be used to solve instances 
of the ECDLP for a certain class of elliptic curves over F2155 by reducing them 
to instances of the HCDLP in genus 31 hyperelliptic curves over F25. The ef- 
fectiveness of the GHS attack for elliptic curves over F2N where N £ [100,600] 
is composite was extensively analyzed in [28], where the elliptic curves most 
susceptible were identified and enumerated. In Section 3 we examine in greater 
detail the effectiveness of the GHS attack on the ECDLP over fields F2 n where 
N is a multiple of 5. In Section 4, we study the case where A^ is a multiple of 4. 
The special case N = 210 is further examined in Section 5. 

3 The Fields F 2 JV with N = 51 

In this section, we argue that the fields F2JV with N = 51 are weak for ECC. We 
restrict our attention to I £ [32,120] (equivalently JV £ [160,600]), since these 
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are the values of interest for cryptographic applications. We draw our conclusions 
by analyzing Pollard’s rho method and the GHS attack for solving instances of 
the ECDLP over these fields. We emphasize that our conclusions are meaningful 
for practice because our analyses are exact — that is, they do not involve any 
asymptotics, crude approximations, or hidden constants. 

3.1 Exact Analysis of Pollard’s Rho Method 

The instances of the ECDLP over F 2 N most resistant to Pollard’s rho method 
(using the random walk of Teske [41]) are for elliptic curves E that have almost 
prime order ^E(¥ 2 n) = 2r for some prime r. Since r « 2^“^, Pollard’s rho 
method has an expected running time of V j2 « steps, where the 

dominant operation in each step is an addition in E(¥ 2 n). We note that even 
though the expression \/7 it/ 2 for the running time of Pollard’s rho method is an 
asymptotic one (as r — >■ 00 ), it has been proven under reasonable assumptions 
[42, Corollary 5.1] that the actual running time for any fixed value of r is within 
a very small constant multiple of the asymptotic time. Thus 

T = = 2^-®*“°-® (2) 

is indeed a very accurate approximation for the running time of Pollard’s rho 
method for solving the hardest instances of the ECDLP over F 2 n. 

3.2 Exact Analysis of the GHS Attack 

Let E be an elliptic curve defined over F 2 W, and let P G E(F 2 iv) have prime 
order r. The CHS attack first uses the CHS reduction to yield an explicit group 
homomorphism : (P) —>■ Jc(F 2 ‘)> where C is a hyperelliptic curve defined 
over F 2 i, and then uses the Enge-Gaudry index-calculus algorithm to solve the 
resulting HCDLP instance. For these parameters, the GHS reduction algorithm 
takes less than a minute on a workstation. Thus we do not include the running 
time of the GHS reduction in our analysis of the GHS attack. 

If the coefficients of E belong to F 2 !, then #E(F 2 i) divides #E(F 2 iv) and 
hence r has bitlength at most N — 1. In this case, Pollard’s rho algorithm can 
solve each ECDLP instance in at most steps, which is significantly 

less than Tp. For example, if (N,l) = (160,32) then Tp = 2”^®-^ and = 2®^, 
and if (N,l) = (600,120) then Tp = 2^®®-® and Tp = 2®'^®. Therefore, we will 
henceforth assume that E is not isomorphic to an elliptic curve defined over F 2 i. 
In particular the magic number m (defined in (1)) is not 1, and hence it follows 
from [30, Corollary 9] that m = 5. Therefore C has genus g = 15 or 16. In 
fact, the vast majority of the 2^+^ — 2*+^ isomorphism classes of elliptic curves 
defined over F 2 n \ F 2 ! yield a genus 16 curve. 

Theorem 3 The GHS reduction yields a genus 15 hyperelliptic curve C defined 
over F 2 i for exactly 2'**+^ — 2 isomorphism classes of elliptic curves defined over 
F2;v \ F 2 ! . 
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Proof. Let q = 2f Let E : y'^ + xy = + ax"^ + h with b G F2N \ F2i, and 

let Ordf,(a;) denote the unique monic polynomial / G F2[x] of least degree such 
that /(ct)( 6) = 0. (If f{x) = then /(cr)(6) = J2l=o Let 

t{x) = x^ + x^ + x"^ + X + \. Since 6 ^ F2i, Ordh(x) = — 1 or Ordh(a;) = t{x). 

Now, by [20, Corollary 6] we have g = 15 if and only if (6^/^) = 0, 

which is the case if and only if (6) = 0. On the other hand, since 

TrF^jv/F i{b) = t{<^){b) it is easy to see that Tr^^^/F i(b) = 0 if and only if 
Ordf,(a;)] t(x), which is the case if and only if Ord{,(x) = t{x). By [30, Corollary 8], 
the latter is true for exactly 2"^* — 1 elements 6 G F2N \F2i. □ 

In our analysis, we restrict ourselves to the case g = 16, for which the HCDLP 
is harder to solve than for g = 15. 

The Enge-Gaudry algorithm [17,7] for finding the logarithm of a divisor D 2 
to the base Di in Jc’(F2i) has three stages. First, a smoothness bound t G [!,(?] 
is selected and a factor base {Pi,P 2 ,... ,Pw} is constructed which contains 
exactly one of D and —D for each prime divisor D of degree less than or equal 
to t. In the second relation generation stage, a random walk is performed in 
the set of reduced divisors equivalent to divisors of the form aDi + /3D2- Each 
t-smooth divisor encountered in this walk yields a relation aiD\ + (3iD2 ~ = 

cq Pj ■ After slightly more than w such relations have been generated and 
stored, one can find by linear algebra modulo r a non-trivial linear combination 
S*7i(eii,ei2 ,... ,e™) = (0,0,... ,0). Thus = 0. and then log^,^ L>2 = 

7i“i)/(X]i 7*A) mod r can be easily computed. 

There is a one-to-one correspondence between points in (7(F2!) and degree 
one divisors in Jc (F2i)- The divisor corresponding to a point {x,y) G C'(F2i) 
is ramified if and only if h{x) = 0 where + h{u)v = f{u) is the Weierstrass 
equation of C; otherwise the divisor splits. According to the Basse- Weil bound, 
ffC{V 2 i) = 2* -1-1 — 7, where [y] < 2gV^. Hence 2* is a very good approximation 
for the number of degree one divisors in Jc (F2i)- We select the smoothness bound 
t = 1, and then the size of the factor base is w « 2*“^. Creating the factor base 
takes negligible time compared to the relation generation and matrix stages, so 
we ignore that stage in our running time analysis. 

The number of 1-smooth divisors in Jc(F2i) is approximately {2^Y / g\ [17, 
Proposition 4]. In fact, the exact number can be efficiently computed. 

Lemma 4 Let C be a hyperelliptic curve of genus g over F^, and suppose that 
there are Ai split degree one divisors and B\ ramified degree one divisors in 
Jc(Fq) (so Ai + Bi = ffC{¥q), and w = Ai/2 + Bi). Then the number of 
1-smooth divisors in Jc(F^) is 



where [ ] denotes the coefficient operator. 

Proof. Similar to the proof of Lemma 2 in [23] . 



□ 
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Now, Ai £ [ 2 * + 1 — 2gV^,2'- + 1 + 2gV^], and Bi £ Using either 

the maximum possible values for Ai and Bi, or the minimum values for Ai and 
Bi, we verified that M(l) « {2^)^/gl is indeed a very good approximation for 
each I G [ 32 , 120 ]. By Weil’s theorem, the size of Jc(F 2 i) satisfies — 1 )^® < 
#Jc(F 20 < (v¥+ 1)29. Thus #Jc{^ 2 ‘) « 2'9 is a very good approximation 
when I £ [ 32 , 120 ]. Hence the expected number of random walk steps before w 
relations are obtained is Ti = w #Jc(F 2 i)/M(l) « 2^~^g\. For g = 16 , we have 



The two dominant operations in a random walk step are an addition in 
Jc(F 2 i) and a smoothness testing. A polynomial a{u) can be tested for 1- 
smoothness by first removing repeated factors (by performing a squarefree fac- 
torization) and then checking whether the resulting polynomial divides — u. 
If a{u) is found to be 1-smooth, then it can be factored using the Cantor- 
Zassenhaus algorithm [5]. We ignore the running time of the factorization 
step in our estimates because 1-smooth divisors are encountered relatively 
infrequently — once every gl = 16! « 2'^'^ random walk steps. 

The system of linear equations has dimension slightly more than w and about 
g non-zero coefficients per equation. It can be solved using Lanczos’s algorithm 
[6], whose running time is closely approximated by T 2 = gw'^ arithmetic opera- 
tions modulo r. We thus have 



T2 



2 ^ 1+2 



3.3 Comparisons 

In order to compare the cost of Pollard’s rho method for solving the ECDLP 
in U(F 2 n) with the cost of the Enge-Gaudry algorithm for solving the HCDLP 
in Jc(F 2 i)) we need to estimate the relative cost of the basic operations in 
these algorithms. Let ce denote the time to perform an elliptic curve addition 
in E(¥ 2 n), cj the time to perform an addition in Jc(F 2 i) (where C has genus 
16), cs the time to test whether a monic polynomial a £ F 2 ![m] of degree 16 
is 1-smooth, and Cr the time to perform a multiplication modulo r. Then the 
expected cost of Pollard’s rho method is 

« CfiTp = CB22“^ 

the expected cost of the random walk stage of the Enge-Gaudry algorithm is 
^1 ~ (cj -I- cs)Ti = (cj + cs)2*+‘*^, 

and the expected cost of the matrix stage of the Enge-Gaudry algorithm is 

R2-CrT2=Cr2^^+^. (3) 

A deficiency in the above comparison is that it only considers the total time 
taken, and not other scarce resources consumed such as memory, number or pro- 
cessors, and communications between processors. Pollard’s rho method can be 
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effectively parallelized (see [31]) so that its expected running time on a network 
of S processors is Tp/S steps. Moreover, the processors do not communicate with 
each other, and only occasionally transmit data to a central server. The amount 
of data stored at the server can be controlled without any noticeable impact on 
the running time (see [26]). Thus time is the only scarce resource consumed by 
(parallelized) Pollard’s rho method. 

The relation generation stage in the Enge-Gaudry algorithm can also be ef- 
fectively parallelized with a speedup that is linear in the number of processors 
employed, and where the processors do not communicate with each other and 
only occasionally transmit data to a central server. However, it is not known 
whether the matrix stage can be parallelized in this way. Moreover, the ma- 
trix may have large storage requirements. Thus, in practice, the matrix stage 
may be the bottleneck in an application of the Enge-Gaudry algorithm. Note, 
however, that Bernstein [3] and Wiener [44] have recently shown that the full 
cost^ of solving a D-dimensional system of sparse linear equations over F 2 can 
be reduced from to Gonsequently, we are of the opinion that 

our comparisons of Pollard’s rho method and the Enge-Gaudry algorithm that 
only consider running times are adequate and meaningful for determining the 
effectiveness of the GHS attack on elliptic curve cryptographic schemes. This 
reasoning is more sound when the time cost C 1 .T 2 is significantly less than CETp. 

To complete the comparisons, we need relative estimates for ce, cj, cs and 
Cr- When mixed affine-projective coordinates are employed, an elliptic curve 
operation in D(F 2 n) requires 8 multiplications in F 2 N. Thus ce ~ Scat, where 
cjv is the time to perform a multiplication in F 2 iv, and we have 

Rp « cw22-5('+i). (4) 

The dominant computation in smoothness testing is the evaluation of 

mod a, where a is a monic polynomial of degree (at most) 16.^ First, one 
iteratively computes and stores mod a for f G [9, 15]; this can be done with 
224 multiplications in F 2 i. Then, one can compute mod a for 5 < z < / by 
successive squarings; this can be done with 128(? — 4) multiplications in F 2 i. 
Thus Cs = (1281 — 288)c;, where c; is the time to perform a multiplication in 
F2i . 

The fastest algorithm known for performing the Jacobian arithmetic in a 
genus 16 hyperelliptic curve appears to be NUGOMP [24].^ The precise opera- 
tion count of NUGOMP has not been worked out. However, Jacobson [22] has 
reported that the cost cj of a Jacobian addition using NUGOMP is less than the 

^ The full cost of an algorithm is its running time mnltiplied by the number of pro- 
cessors employed. 

^ In practice, the squarefree factorization may not be performed. In that case, Gaudry’s 
algorithm only considers 1-smooth divisors the points in whose supports all have 
coefficient 1. This does not significantly affect the expected number of random walk 
steps. 

^ Experiments carried out by Jacobson [22] indicate that NUGOMP is faster than 
Gantor’s algorithm and its variants [4,33] for hyperelliptic curves of genus > 7. 
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cost cs of computing mod a. For example, he reports that cs ~ 2.3cj when 
I = 37. The ratio csjcj grows with I because the number of F 2 i -multiplications 
for smoothness testing increases with /, while the number of F 2 i -multiplications 
for Jacobian addition is independent of 1. Thus the approximation i?i « csTi is 
justified, and hence 



« (128/- 288)2'+-^3^i. (5) 

Finally, we need to estimate the relative costs cn, Q and Cr of a multiplica- 
tion in F 2 N, F 2 i, and modulo r, respectively. We use the relative timings on a 
Pentium II 400 MHz reported by Hankerson [19] for his optimized implementa- 
tion of multiplication in F 2 N and F 2 i using the methods of [27], and for integer 
multiplication with Barrett reduction [2]. Table 1 shows these costs for some 
selected fields. 



Table 1. Time estimates for Pollard’s rho method for solving an ECDLP instance 
in E{¥ 2 n), and for the relation generation and matrix stages of the Enge-Gaudry 
algorithm for solving an HCDLP instance in Jc(¥ 2 i ) where G is a genus 16 hyperelliptic 
curve, cjv, ci and Cr are the relative times for a multiplication in F 2 iv , Fji , and modulo 
an M-bit prime, respectively. The estimates for Rp, Ri, R 2 were derived using formulas 
(4), (5), (3), respectively. In columns 3, 4, 5, the time units are F 2 V -multiplications, 
Fji -multiplications, and modulo-r multiplications, respectively. In columns 9, 10, 11, 
the time unit is an F 2 ! -multiplication. 



N 


1 


Rp/CN 


Rijci 


RljCr 


Cn 


Cl 


Cr 


Rp 


Rl 


R 2 


160 


32 


^BXT5— 


2’*' 


2*’’’ 


7.7 


1.0 


5.8 




2*'' 


^69- 


185 


37 


295 


292 


276 


7.9 


1.0 


5.8 


298 


292 


279 


210 


42 


2IO7.5 


297 


CO 


10.3 


1.0 


8.0 


2110.5 


297 


289 


255 


51 


2130 


2107 


2104 


11.0 


1.0 


7.7 


2133 


2107 


2107 


385 


77 


2195 


2133 


2156 


15.0 


1.0 


9.4 


2199 


2133 


2160 


515 


103 


2260 


2160 


2208 


15.3 


1.0 


11.1 


2264 


2160 


2212 


600 


120 


2302.5 


2177 


2242 


17.7 


1.0 


12.5 


2306.5 


2177 


2246 



Remark 5 ( miscellaneous notes on Table 1 ) 

(i) The relative times for cn, c/ and Cr are, of eourse, dependent on the ehoice 
of algorithms, platform, and implementation. Nevertheless, we do not expect 
that these relative times will differ by large factors ( e.g., greater than 4 ) from 
the “correct” times. For example, the ratios cnIci for the seven (N,l) pairs 
of Table 1 that we obtained using the routines for field arithmetic in Victor 
Shoup’s NTL package [37] are 2.3, 2.1, 3.8, 3.5, 6.0, 8.7, and 10.5. 

(ii) We use the estimates for Rp, R\ and R 2 to justify our main conclusion 
that the fields F 2 W, where N G [185, 600] is divisible by 5, are weak for ECC. 
This statement becomes stronger as N increases. In particular, it is debatable 
whether our estimates justify the conclusion that the field F 2185 is weak for 
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ECC. This field is of special interest because it is explicitly included in the 
OAKLEY key agreement protocol that was proposed for Internet applications 
[32]. 

(Hi) By selecting only a proportion of degree one divisors in the factor base, one 
can decrease the cost of the matrix stage at the expense of increasing the 
cost of the relation generation stage. More precisely, if the factor base size 
is reduced by a factor of 2‘^, then i ?2 decreases by a factor of 2^^^, while R\ 
increases by a factor of2‘^^^~^\ Eor example, if we select d = 4 for the case 
N = 600, then we obtain Ri = 2^^’^ and i ?2 = 2^^®. We can then derive our 
claim made at the end of Section 1 that the ECDLP for all elliptic curves 
over F 2600 can be solved about 2®® times faster than it takes Pollard’s rho 



method to solve the hardest instances. Similarly, for (N,d) = (385,1.5) we 
have (i?i,i? 2 ) = (2^®®-®, 2^®"^), and for (N,d) = (515,3) we have (i?i,i? 2 ) = 



^ 2^05 



4 The Fields ¥^n with N = 41 

Smart [39] presented some experimental evidence that the fields F 2 n, where 
N is divisible by 4, are weak for ECC. In this section we repeat our analysis 
from Section 3 of the CHS attack for the ECDLP over these fields, and precisely 
quantify the weakness of the fields. We conclude that the fields F24i exhibit some 
signs of being weak, but are not as weak as the fields F 2 Si. 

Let N = 41. If if is an elliptic curve defined over F 2 N \ F 22 i, then it follows 
from [30, Theorems 5, 6] that the magic number m (defined in (1)) is 3 or 4. By 
[20, Corollary 6], the CHS reduction yields a hyperelliptic curve C of genus 4 or 
8, respectively, over F 2 ! . The genus of C is 8 in a vast majority of the cases. Since 
this yields the worse running time for the Enge-Gaudry algorithm, we focus on 
this case. 

Arguing as in Section 3, we have 

i?p « ctv 22'+2 ®, i?i «q(32Z-48)2'+i 4, i?2 « c,.22'+b 

Hence the running time of Pollard’s rho algorithm is very close to the running 
time of the matrix stage. However, if the factor base size is reduced by a factor 
of 2*^, then R2 decreases by a factor of 2^^^, while R\ increases by a factor of 2’^'’* 
(cf. Remark 5(iii)). Table 2 list the costs Rp, R\, R2 for some selected fields and 
choices of d that roughly balance Ri and i? 2 - 



5 The Field F 2210 

In this section we argue that the field F 2210 is particularly weak for ECC. Recall 
from Section 3.3 that Rp « Ri « c/2®^, and i ?2 ~ Cr2®® for the 

parameters {N, 1) = (210,42). We next consider the CHS attack with parameters 
(N,n,m) = (210,6,5). 
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Table 2. Time estimates for Pollard’s rho method for solving an ECDLP instance 
in E{¥ 2 n), and for the relation generation and matrix stages of the Enge-Gaudry 
algorithm for solving an HCDLP instance in Jo (1^2* ) where G is a a genus 8 hyperelliptic 
curve. The factor base in the Enge-Gaudry algorithm is comprised of 1/2^* of all degree- 
one prime divisors, cjv, ci and Cr are the times for a multiplication in F 2 V, F 20 and 
modulo an A^-bit prime, respectively. 



N 


1 


d 


Rp/CN 


Rijci 


i?2/Cr 


160 


40 


2 




— 27S— 


2" 


192 


48 


3 


298.5 


294 


291 


224 


56 


4 


2114.5 


2109 


2104 


256 


64 


5 


2130. 5 


2124 


2119 


384 


96 


8 


2 I 94.5 


2178 


2177 


512 


128 


12 


2258.5 


2238 


2233 


600 


150 


14 


2302.5 


2274 


2273 



5.1 Exact Analysis of the GHS Attack with (TV, n, m) = (210, 6, 5) 

About 2^^^ isomorphism classes of elliptic curves over F2210 have magic number 
m = 5 relative to n = 6, as a consequence of which there exists an even more 
effective ECDLP solver for at least about 25 % of all elliptic curves over F2210. 
We first analyze the GHS attack in this case, and then discuss how to extend it 
beyond these 2^^® isomorphism classes. 

Let E : y^+xy = x^+ax'^+b be an elliptic curve over F2210 with magic number 
m = 5 relative to n = 6. First note that by [ 28 , Lemma 8] and [ 20 ] we require 
that Tr]f^^/][r2(a) = 0 for the GHS reduction to yield a group homomorphism 
<P : if(F22io) — >■ Jc(F 235 ) into the Jacobian of a hyperelliptic curve C defined 
over F235. We thus restrict ourselves to curves of the form E : y^ + xy = x^ + b. 
Then m = 5 only if & G F2210 \ F235, while if 6 G F235 we have m = 1 and 
#E(F 235 )|#E(F22 10^. Agciin, the GHS reduction tukes only ci few seconds. The 
resulting hyperelliptic curve has genus 15 or 16 . Similar to Theorem 3 , this time 
using t{x) = -b -b 1 and taking into account that Tr]f^2,^g/f2 (a) = 0 we 
can show that there are exactly 2^"^° — 2"^° isomorphism classes of elliptic curves 
defined over F2210 \ F235 for which the GHS attack yields a hyperelliptic curve 
C defined over F235 of genus g = 15 . For exactly ( 2 ^^^ — 2 ™) ( 2 ^^ — 1 ) « 2 ^^® 
isomorphism classes, a genus 16 curve is obtained. 

Using the Enge-Gaudry index-calculus algorithm with a factor base of w « 
2^^ degree-one prime divisors, it takes an expected number of Ti « 2^^+^^®/TVf(l) 
random walk steps in the Jacobian to complete the relation generation stage. For 
the case g = 16 , we get Ri « 0352®° and R2 « Cr2’^^. 

5.2 Extended GHS Attack 

We next discuss how to extend the GHS attack beyond the set of elliptic curves 
with magic number 5 relative to n = 6. We first classify the curves over F2210 
with (n, to) = (6, 5 ). 
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Theorem 6 Let N = 0 (mod 6), and let E : + xy = + ax'^ + b be an 

elliptic curve over F 2 -N with magic number m = 5 relative to n = 6. Then 
TrF2N/F2(^) = 0- 

Proof. Let N = n-l, q = and let a : F 2 N — >• F 2 N denote the power-g Frobenius 
a !->• Qf9. Since x^ — 1 = {x— + x+1)^ over F 2 , there are two possibilities 

to obtain magic number 5, namely Ordf,(x) = {x — iy{x'^ + x + 1)^ with j = 0 
or j = 1. If j = 0, then 0 = Ordb(CT)(6) = 6"^ +6* +6. Thus, Trf^^/r^ (6) = 
6^* = J2f=o^ = 0- If J = I> 0 = Oi'dt,((T)(6) = + 

+ = 0. Hence Tr^^^/^yb) = J2^~y^{b+b“+b<i" +b<i" +b<i"y' = 0. 

□ 

The following result is probably well known. We include a proof since we 
could not find it elsewhere. 

Lemma 7 Let E : y'^ + xy = x^ + b be an elliptic curve over F 2 jv, where N >3. 
Then Trr^„/][r 2 (&) = 0 if and only z/#if(F 2 Jv) = 0 (mod 8). 

Proof. Since the 2®-torsion group (s G N) of if over the algebraic closure F 2 N is 
cyclic, we equivalently show that Tr^^jy/c^ (6) = 0 if and only if if(F 2 Af) has a 
point of order 8. The 8th division polynomial of E is given by 

fs{x) = + (&2 + b)x^° + {b* + b^x^^ + b^x^ = (x® + &a;2)2(xi® + bx^ + by, 

where x^ + bx'^ is the 4th division polynomial. In the very last term, we substitute 
X® by z. Then E has a point of order 8 with x-coordinate defined over F 2 n 
if and only if z'^ + bz + b"^ factors over F 2 N, which is the case if and only if 
TrF 2 w/F 2 (^^) = TrF 2 ]v/F 2 (^) = 0- H remains to show that if TrF 2 „/]f 2 (&) = 0, 
then the ^-coordinate is also defined over F 2 N. So assume Tr^^^/F^ (6) = 0, let 
/3 G F 2 N such that b = pf' + fd, and let t G F 2 N such that /3 = t®. Then 

z^ Ebz+b'^ = {z + -k t^'^ + + ty{z + -k t'^y, 

and the eighth root t'^ + of is the x-coordinate of the 8-division point. 

It is easily verified that y = t® -k G F 2 N is an appropriate ^-coordinate. □ 



Corollary 8 Let if : -k xj/ = x® -k ax^ + b be an elliptic curve over F 2210 

with magic number m = 5 relative to n = 6 and for which the GHS reduction 
yields a group homomorphism <L> : if(F 22 io) — >■ Jc(F235) into the Jacobian of a 
hyperelliptic curve C defined oxer F 235 (of genus 15 or 16j. Then #if(F 22 io) = 0 
(mod 8). 

Proof By [28, Lemma 8] and [20], we require that TrF^ 2 io/F 2 (®) = ^ Ih® GHS 
reduction to work when (n, m) = (6, 5). The statement then immediately follows 
from Theorem 6 and Lemma 7. □ 
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Extending the GHS attack to the entire isogeny class of a weak curve 
over F 2210 . We argue that for all but a few elliptic curves E with #E(F 22 io) = 0 
(mod 8), any ECDLP instance can be solved essentially in running time i?i « 
C3s2®°. 

An isogeny between two elliptic curves E and E' over a field AT is a non- 
constant morphism E ■. E ^ E' such that the neutral element of E is mapped 
to the neutral element of E' . The curves E and E' are called isogenous over K 
if E is defined over AT; we write A ~ A'. If A" is a finite field, then A ~ A' if 
and only if ^E{K) = ^E'{K). The equivalence classes with respect to isogeny 
are called isogeny classes. 

Let A be a non-supersingular elliptic curve over F 2 at. We call t = 2^ + 
1 — #A(F 2 iv) its trace and A = — 4 ■ 2^ its discriminant; note that A < 

0 and Z\ = 1 (mod 8). The endomorphism ring End(A) of A is an order in 
the maximal order O of the imaginary quadratic number field Q(-\/4). More 
precisely, Z[7 t] C End(A) C O, where tt : A — >■ A is the 2^-th power Frobenius 
map on A. The endomorphism class of A, denoted by C(A), is the set of all 
isogenous, non-isomorphic curves A' with End(A) = End(A'). There exists a 
one-to-one correspondence between the Picard group (denoted Cl(End(E))) of 
the order End(A) and C(A) ([10, Th. 3.4.6]). 

For any elliptic curve A over F 2210 we can use an algorithm of Kohel [25] 
to compute a chain of isogenies defined over F 2210 from A to an elliptic curve 
A' with End(A') = O. This takes running time O(s^), where s is the largest 
prime dividing the conductor c = [O : End(A)] of End(A). Note that c divides 
[O : Z[7t]]. In practice, [O : Z[7t]] is small and smooth so that Kohel’s algorithm 
takes negligible time compared to Ri . For the following, we therefore may assume 
that End(A) is maximal. Then Cl(End(E)) is the ideal class group of the maximal 
order O, which we simply denote by Cl. 

Now, there exist 2^°® isomorphism classes of elliptic curves Eq j, over F 2210 
with Tr]f^ 2 io/F 2 (^) = 0 (i.e., with group order divisible by 8), and 2^’^® — 2^°® 
elliptic curves Eo,t over F 2210 with {n,m) = (6,5). It is therefore reasonable to 
expect that a randomly chosen elliptic curve over F 2210 with group order divisible 
by 8 has magic number 5 relative to n = 6 with probability approximately 
2175/2209 _ 2“34^ Moreover, we make the heuristic assumption that the same is 
true when A is chosen randomly from a fixed endomorphism class. 
Assumption A. Let A = Ao,t, an elliptic curve over F 2210 with ^Ao,t,(F 22 io) = 0 
(mod 8) and such that C(A) contains a curve with magic number m = 5 relative 
to n = 6. Then any curve A' that is randomly chosen from C(A) (with respect 
to the uniform distribution) has magic number m = 5 relative to n = 6 with 
probability 2~^‘^. 



Remark 9 f further justification of Assumption AJ For arbitrary N = 0 
(mod 6), let N = 61 and q = 2'- . By [30, Theorem 5], there exist = 

25N/6 isomorphism classes of elliptic curves Aq,& over F 2 N with (n,m) = 

(6,5), while there exist 2^~^ isomorphism classes Aq,& with Trjr^^y/f^ (6) = 0. 
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Thus, the prohahility in the above Assumption A generalizes to 2 For 

36 < N < 84, this has been confirmed in extensive experiments. 



Remark 10 ('restriction of Assumption A) Of course, Assumption A is not 
accurate if Cl is very small, in the order of 210 • 2 ^^ « 2 ^^ and smaller. This 
can happen only if ffE{¥ 2 '^w) lies at the extreme ends of the Hasse interval and 
thus A is significantly smaller than its expected value 2^^^, or if A has a very 
large square factor. But note that A < 2^^° if and only if |t| > 
which affects only a very small fraction of at most 1/2®^ of the elliptic curves 
over F 2210 ; the proportion of elliptic curves over F 2210 that have A < 2^°° is 
at most 1/2^^^. If A > 2^®° and A = f'^d with d = 1 (mod 8) and squarefree, 
then ffCl < 2'^^ only if f is (roughly) at least 2^°, which is most unlikely for 
non-subfield curves. 



Remark 11 ('the exceptional set in Assumption A) Our reasoning below does 
not apply when C{E) does not contain a curve with (n,m) = (6,5). We expect 
this to be a very rare case, that, again, should only happen when Cl is very small. 
As with the previous remark, this affects only a tiny proportion of elliptic curves 
over F 2210 with group order divisible by 8. 

Given a curve E over F 2210 with group order divisible by 8, it is now possible 
to compute a curve E' over F 2210 , isogenous to E and with (n, m) = (6, 5) along 
with a chain of low-degree isogenies from E to E'. This is based on ideas from [15] 
to simulate a random walk in the endomorphism class of E, exploiting the above 
one-to-one correspondence between Cl and C{E). This works as follows: Let 
E = Eo^b, let j{E) = b~^ be its j-invariant, and let p be a prime with = 1. 
Then p splits in O, (p) = pip 2 , and the modular polynomial <l>p{j{E),X) has 
two roots ji and j '2 in F 2210 [13]. These roots can be computed by a probabilistic 
algorithm using O(210p^) operations in F 2210 . The two isogenies mapping E to 
Eq j-i and Eq j~i correspond to the multiplication of a fixed ideal, say O, by 
the two prime ideals pi and p 2 lying over p. As explained in [15], it is easy to 
determine whether ji corresponds to pi or p 2 . Now, let V be the set of the 30 
smallest primes p such that = 1, and such that the pairs of ideal classes 
corresponding to the prime ideals lying over p are pairwise distinct in Cl. A 
pseudo-random walk (E^) in C{E) is defined as follows: Let Eq = Ao,b and 
bg = b and ao = O. For i = 1, 2, . . . , let p €r V and j = 6i_i, and compute the 
two roots in F 2210 of <Pp{j,X)] let j' be one of these roots, and let bi = 
Simultaneously a chain (a^) of ideals in Cl is computed such that for each index 
k, the ideal corresponds to the isogeny mapping E to Ek. 

The set V has been chosen such that the walk (Ei) indeed simulates a random 
walk in the endomorphism class of E. Experimentally, we found that max{p G 
V} G [190,530], where we considered 5000 randomly chosen discriminants, with 
only 2 discriminants yielding maximum values > 500 (and we obtained maxjp G 
V} G [150,380] if we required only ffV = 20). Thus, each random-walk step 
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takes up to about 210 • 500^ « 2^® operations in F2210, given that computing the 
roots of the modular polynomial is by far the most time-consuming step. 

Now, under Assumption A, after expected 2®"* random-walk steps in C{E) 
an elliptic curve over F2210 is encountered that is isogenous to E and whose 
magic number relative to n = 6 is m = 5. Thus, altogether it takes something on 
the order of 2®® operations in F2210 to find a curve with (n, m) = (6, 5) isogenous 
to a given curve over F2210, along with an ideal 0 that represents the isogeny 
between the two curves. We note that this running time is negligible compared 
to i?i and i?2- Also, this step can be efficiently parallelized. 

The remaining steps to compute the explicit isogeny between E and E^ 
are identical with Stages 2 and 3 of [15]: index-calculus techniques are used to 
represent a as a product of just a few ideals of small norm, and finally Vein’s 
formulae are applied. This can be accomplished in time 0(2^/^+®) = const -2®®, 
which also is negligible when compared to R\ and i?2- 

5.3 Further Extension to Elliptic Curves with Tr^^^jo /F 2 {b) 0 

We further extend the set of elliptic curves over F2210 for which any ECDLP 
instance can be solved potentially faster than applying Pollard’s rho method to 
the hardest ECDLP instances over F2210. For this, we use Hess’s recent general- 
ization [20] of the CHS attack to reduce instances of the ECDLP to instances of a 
discrete logarithm problem in the divisor class group of a curve C over F235. Note 
that this curve C is in general not hyperelliptic. Nevertheless, subexponential- 
time methods for discrete logarithm computation are available for such curves 
of large genus (see [20] and the references given there) . However we do not have 
an exact analysis of their running times. 

Let N = nl for some integers n and 1 . Consider the elliptic curve E : y^+xy = 
X® -I- ax^ + b over F2iv with 0 , and let (P) be a subgroup of E(F2iv) of prime 
order r. 

Let q = 2’', and for 7 G F2N let Ord.y(x) denote the unique monic polynomial 
/ G F2 [x] of least degree such that f{cr){'j) = 0 where a is the power-g Frobenius. 

Let 7 i, 72 G F2JV such that b = (7172)^. Let c = I/71; then 72 = b^^^c. Let 
Si = deg(Ord.yJ (z = 1,2) and t = deg(lcm(Ord.^j , Ord.y2)). Via a birational 
transformation the defining equation of E can be brought into the form y'^ + y = 
l/(cx) -I- a-l- 6^/^cx. Then Hess’s generalization [20, Theorems 4,5,7] of the CHS 
attack allows one to effectively reduce the ECDLP in (P) to the DLP in a 
subgroup of order r of the divisor class group of an explicitly computable curve 
C over F2! of genus g = 2 * — 2*“®i — 2*“^^ -I- 1, provided that Trjr^^/c^ (a) = 0 or 
TrF,n/F,(7i) yf 0 or Tr (72) yf 0. The following theorem, due to Hess [21], 
gives an effective method to obtain curves C of relatively small genus, which 
applies to all but a small exceptional set of elliptic curves over F2210. Here, 
denotes the Euler phi function for polynomials, i.e., for m{x) G F2[x], ^(m(x)) 
is the number of elements 7 G F2 n with Ord..,,(x) = m{x). 

Theorem 12 Let n = nin 2 , K = F^n, k = ¥q and K\ = F^ni . Let j3 € K. 
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(i) There exist 71,72 G K with f 3 = 7172 and 

Ord^i(a:) | - 1) and 0rd^2(a;) | ^ . (6) 

(a) IfTiK/Kiif^) ^ 0 then 71 = Txk/Ki{P) and 72 = /3/7i satisfy (6). 

(Hi) IfTrx/KiiP) = 0 then 71 = 1 and 72 = /3 satisfy (6). 

(iv) Letmi,m,2 GF2[x] suchthatmi \ 1 ) andni2 \ (x— l)(x” — l)/(a;”i — 1 ). 

Let Trx/KiiP) ^ 0 . The number of 13 = 7172 such that Ord^j(ai) = mi{x) 
and Ord^2(2^) = rn2{x) is <T{mi{x))<L>{m2{x)) / {q — 1 ). 

(v) In the case of (ii) we have: 

(a) k{( 3 ) = ^(71,72). 

(b) Trx/kil2) 7^ 0 if and only if Ui odd. 

(c) TrK/kili) 7^ 0 if and only if Vx+i{Ord^^{x)) = 2"2("). 

In general, Ua;+i(Ord^i(a;)) < 2’'2("). 

Proof. Let us first note that for j G K, Ord^(x) divides (a;”i — 1 ) if and only 
if 7 G Ki. Further, Ord^(x) divides {x — l)(a;” — l)/(x”^ — 1 ) if and only if 
T^^K/Kiil + = 0, and the latter implies Tix/Kiil) G k. 

(i) Follows from (ii) and (iii). 

(ii) Let 71 = TrK/KiiP) ^ 0 and 72 = ( 3 h\. Then 71 G iLi. Also, Tt:k/Ki{12) = 
Tr/f/ATi (/ 3 )/ 7 i = 1, and by the additivity of the trace function Tr^//f^(72 + 

7I) = 0 . Now, for g{x) = (x” — l)/(cc"i ~ 1 ) = l + x”i H |_3;("2-i)ni 

we have g(a)(a) = Tr^/Kiio^) for all a € K. Thus, 

G - + -'i'l = “■ 

(iii) If Tix/KiiP) = 0 then, for any 71 G Ki and 72 = / 3 / 7 i, we also have 

(72) = 0. 

(iv) Suppose Tr^/KiiP) 7^ 0 . For i = 1 , 2 , there are <P{mi{x)) elements 71 such 
that Ord^. (x) = mi{x). Now, let /? = 7172 = 7 i 72 where Ord-y. (x) = 
Ordyy(x) = mi{x). Since mi(x) divides x"^ — 1 , we have 7^ = 71 /A for 
some A G All. Now, 

0 = Tr;^/^^(7^ + (7^)”^) = Tr^/jf^(A72 + (A72)"^) 

= Tr;^;/^^(A 72 + (A72)®) + 2Tr;^;/;^;^(A7|) 

= ATri;i:/i^j (72 + 7I) + (A + A‘^)Tr;^/i^j(7|) = (A + X^)Tvk/kAi2)- 

Since (/?) 7^ 0 , also (7I) 7^ 0 - Thus, A + A^ = 0 , and therefore 

A G ¥q, A yf 0 . On the other hand, since f{a){Xj) = Xf{a){'j) for any A G 
and / G F2 [x], if A G F^, A yf 0 , then Ordy,(x) = 0rdv7(x). 

(v) (a) We have /3 G ^(71,72), since /3 = 7172- Further, 71,72 G k{l 3 ) since 

7i = TlrK/KiiP) and k{P) is Galois. 

(b) We have 

TrK/fc(72) = TrK/fc(/3/7i) = TiK^/k (Tr^/^j (/3/71)) 

= Tr^j/fc (1) = [Ki : k] = m. 
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(c) We have Vx+i{x'^ ~ 1) = If Wa;+i(0rd(7i)) < then Ord(7i) | 

, hence T^x/ki'^i) = 0- The result follows. □ 



Corollary 13 For any elliptic curve E : y'^ + xy = + ax^ + b over F^e with 

Trp J./F 3(6) ^ 0, Hess’s generalization [20] of the GHS attack can he used to 
reduce the ECDLP in i?(Fq6) to the DLP in the divisor class group of a curve 
over Fq of genus at most 14. 

Proof. Let (3 = We apply Theorem 12 with n = 6 and n\ = 3. With 
7i = TrF^g/]F^3 (/?) and 72 = /3/7i, Hess’s generalization of the GHS attack 
applies and yields a curve over F^ of genus g = 2* — 2*“®i — 2*“*^ _|_ where 
si, S2, t are as defined above. Table 3 lists all possible values for g for the various 
choices of Ord.^i and Ord.^2. □ 



Table 3. Genera in Hess’s generalization of the GHS attack (see Corollary 13) for 
fields Fq6. 



Ord^-j (x) 


Si 


Ord.y2 (x) 


S2 


t 


9 


X + 1 


1 


X + 1 


1 


1 


1 


X + 1 


1 


(x + lf 


2 


2 


2 


X + 1 


1 


X^ + X + 1 


2 


3 


3 


X + 1 


1 


(x + l)(x^ + X + 1) 


3 


3 


4 


X + 1 


1 


(x + l)^(x^ + X+ 1) 


4 


4 


8 


x'^ + X + 1 


2 


{x + lf 


2 


4 


9 


x'^ + X + 1 


2 


X^ + X + 1 


2 


3 


5 


x^ + x + 1 


2 


(x + l)(x^ + X + 1) 


3 


3 


6 


x^ + x + 1 


2 


(x + l)^(x^ + X + 1) 


4 


4 


12 


x^-1 


3 


{x + lf 


2 


4 


11 


x^-1 


3 


(x + l)(x^ + X + 1) 


3 


3 


7 


x^-1 


3 


(x + l)^(x^ + X+ 1) 


4 


4 


14 



From Theorem 12(iv) and [30, Theorem 5] we see that the vast majority of 
elliptic curves over F^e yield a genus 14 curve over F^. 

Remark 14 ( application of Theorem 12 to fields F24 j ) For n = 4 we can use 
the same technique to show that Hess ’s generalization of the GHS attack can be 
used to produce curves over ¥21 of genus at most 6. In fact, for most such curves 
the genus is equal to 6 and in general the resulting curve will he non-hyperelliptic. 

5.4 Comparisons 

Table 4 shows the costs Rp, R\, R 2 for the attacks on the ECDLP for elliptic 
curves defined over F2210 as discussed in this section. 

Consequently, for all elliptic curves over F2210, the ECDLP can be solved 
about 2^^ times faster than it takes Pollard’s rho method to solve the hardest 
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Table 4. Time estimates for Pollard’s rho method for solving an ECDLP instance 
in E(F 22 io), and for the relation generation and matrix stages of the Enge-Gaudry 
algorithm for solving an HCDLP instance in Jc(F242) and Jc(F235) where (7 is a 
genus 16 hyperelliptic curve. C 210 , c; and Cr are the relative times for a multiplication 
in F 2210 , F 2 !, and modulo an 210-bit prime, respectively. 



N 


n 


1 


Rp/C2W 


Rijci 


A2/Cr 


C 2 IO 


Cl 


Cr 


Rp 


Ri 


R 2 


210 


5 


42 


2 J-U f .b 




2“*’ 


10.3 


1.0 


8.0 


2ilU.b 


2 ^' 




210 


6 


35 


2 IO 7.5 


290 


272 


10.3 


1.0 


8.0 


2 IIO .5 


290 


275 



instances. For about a quarter of all curves over F2210 (those with (a) = 

TrF^2io/F2(^) = 0) the ECDLP can be solved about times faster than with 
Pollard’s rho method. As argued in Section 5.3, for essentially all elliptic curves 
over F2210, the ECDLP presumably can be solved significantly faster than with 
Pollard’s rho method, although an exact analysis has not been conducted. 

6 Conclusions 

We have argued that the fields F2V, where N G [185,600] is divisible by 5, are 
weak for ECC. The fundamental open problem is to determine whether there 
are any fields that are bad for ECC. We have provided some evidence that the 
field F2210 is a prime candidate for being bad. 

Another candidate for a bad field is F2161. For 2®'^ of the 2^®^ isomorphism 
classes of elliptic curves E over F2161, the CHS reduction yields a hyperelliptic 
curve C of genus (7 or) 8 over F223, where the HCDLP is feasible. In our notation, 
we have Rp = cb 2®®, Ri = (cj -I- 05)2®”^, and i?2 = 0^2'*’^, where ce denotes the 
time to perform an elliptic curve addition in E(F2iei), cj is the time to perform 
an addition in Jc(F223), eg is the time to test whether a monic polynomial 
a G F223 [m] of degree 8 is 1-smooth, and Cr is the time to perform a multiplication 
modulo a 160-bit prime. If an arbitrary ECDLP instance over F2161 can be 
efficiently mapped to an ECDLP instance for an isogenous elliptic curve that 
belongs to the aforementioned class of 2®"‘ curves, then one would conclude that 
F2161 is bad for ECC (see also [15] and [28, Remark 20]). No such mapping is 
known so far (see also [43]). 

An important open question in hyperelliptic curve cryptography is whether 
there are algorithms for solving the HCDLP curve that are faster than the Enge- 
Gaudry algorithm. Because of the relevance to solving the ECDLP, improve- 
ments by a constant factor would be of interest. For example, the possibility of 
using sieving (see [9]) to generate relations needs to be further explored. 

Galbraith [14] has shown that Weil descent can be used to attack the HCDLP 
over some low genus hyperelliptic curves defined over characteristic two finite 
fields of composite extension degrees. An open problem is to determine whether 
there are any weak fields for genus two hyperelliptic curve cryptography. 
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